Tuesday, 20 January 2004 - Dare Obasanjo's weblog

January 20, 2004

@ 03:50 PM

One Click Subscription and Blogging Tool Vendors

I just read Jon Udell's post on What RSS users want: consistent one-click subscription where he wrote

Saturday's Scripting News asked an important question: What do users want from RSS? The context of the question is the upcoming RSS Winterfest... Over the weekend I received a draft of the RSS Winterfest agenda along with a In an October posting from BloggerCon I present video testimony from several of them who make it painfully clear that the most basic publishing and subscribing tasks aren't yet nearly simple enoughrequest for feedback. Here's mine: focus on users. .

Here's more testimony from the comments attached to Dave's posting:

One message: MAKE IT SIMPLE. I've given up on trying to get RSS. My latest attempt was with Friendster: I pasted in the "coffee cup" and ended up with string of text in my sidebar. I was lost and gave up. I'm fed up with trying to get RSS. I don't want to understand RSS. I'm not interested in learning it. I just want ONE button to press that gives me RSS.... [Ingrid Jones]

Like others, I'd say one-click subscription is a must-have. Not only does this make it easier for users, it makes it easier to sell RSS to web site owners as a replacement/enhancement for email newsletters... [Derek Scruggs]

For average users RSS is just too cumbersome. What is needed to make is simpler to subscribe is something analog to the mailto tag. The user would just click on the XML or RSS icon, the RSS reader would pop up and would ask the user if he wants to add this feed to his subscription list. A simple click on OK would add the feed and the reader would confirm it and quit. The user would be back on the web site right where he was before. [Christoph Jaggi]

Considering that the most popular news aggregators for both the Mac and Windows platforms support the feed "URI" scheme including SharpReader, RSS Bandit, NewsGator, FeedDemon (in next release), NetNewsWire, Shrook, WinRSS and Vox Lite I wonder how long it'll take the various vendors of blogging tools to wake up and smell the coffee. Hopefully by the end of the year, complaints like those listed above will be a thing of the past.

Categories: RSS Bandit | Technology

January 20, 2004

@ 03:33 PM

Comments [0]

On Versioning XML Vocabularies

One of the biggest problems that faces designers of XML vocabularies is how to make them extensible and design them in a way that applications that process said vocabularies do not break in the face of changes to versions of the vocabulary. One of the primary benefits of using XML for building data interchange formats is that the APIs and technologies for processing XML are quite resistant to additions to vocabularies. If I write an application which loads RSS feeds looking for item elements then processes their link and title elements using any one of the various technologies and APIs for processing XML such as SAX, the DOM or XSLT it is quite straightforward to build an application that processes said elements which is resistant to changes in the RSS spec or extensions to the RSS spec as the link and title elements always appear in a feed.

On the other hand, actually describing such extensibility using the most popular XML schema language, W3C XML Schema, is difficult because of several limitations in its design which make it very difficult to describe extension points in a vocabulary in a way that is idiomatic to how XML vocabularies are typically processed by applications. Recently, David Orchard, a standards architect at BEA Systems wrote an article entitled Versioning XML Vocabularies which does a good job of describing the types of extensibility XML vocabularies should allow and points out a number of the limitations of W3C XML Schema that make it difficult to express these constraints in an XML schema for a vocabulary. David Orchard has written a followup to this article entitled Providing Compatible Schema Evolution which contains a lot of assertions and suggestions for improving extensibility in W3C XML Schema that mostly jibe with my experiences working as the Program Manager responsible for W3C XML Schema technologies at Microsoft.

The scenario outlined in his post is

We start with a simple use case of a name with a first and last name, and it's schema. We will then evolve the language and instances to add a middle name. The base schema is:

<xs:complexType name="nameType">
<xs:sequence>
<xs:element name="first" type="xs:string" />
<xs:element name="last" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>

Which validates the following document:

<name>
<first>Dave</first>
<last>Orchard</last>
</name>

And the scenarios asks how to validate documents such as the following where the new schema with the extension is available or not available to the receiver.:

<name>
<first>Dave</first>
<last>Orchard</last>
<middle>B</middle>
</name>

<name>
<first>Dave</first>
<middle>B</middle>
<last>Orchard</last>
</name>

At this point I'd like to note that this a versioning problem which is a special instance of the extensibility problem. The extensibility problem is how does one describe an XML vocabulary in a way that allows producers to add elements and attributes to the core vocabulary without causing problems for consumers that may not know about them. The versioning problem is specific to when the added elements and attributes actually are from a subsequent version of the vocabulary (i.e. a version 2.0 server talking to a version 1.0 client). The additional wrinkle in the specific scenario outlined by David Orchard is that elements from newer versions of the vocabulary have the same namespace as elements from the old version.

A strategy for simplifying the problem statement would be if additions in subsequent versions of the vocabulary had were in a different namespace (i.e. a version 2.0 document would have elements from the version 1.0 namespace and the version 2.0 namespace) which would then make the versioning problem the same as the extensibility problem. However most designers of XML vocabularies would balk at creating a vocabulary which used elements from multiple namespaces for its core [once past version 2.0] and often site that this makes it more cumbersome for applications that process said vocabularies because they have to deal with multiple namespaces. This is a tradeoff which every XML vocabulary designer should consider during the design and schema authoring process.

David Orchard takes a look at various options for solving the extensibility problem outlined above using current XML Schema design practices.

Type extension

Use type extension or substitution groups for extensibility. A sample schema is:

This requires that both sides simultaneously update their schemas and breaks backwards compatibility. It only allows the extension after the last element

There is a [convoluted] way to ensure that both sides do not have to update their schemas. The producer can send a <name> element that contains xsi:type attribute which has the NameExtendedType as its value. The problem is then how the client knows about the definition for the NameExtendedType type which is solved by the root element of the document containing an xsi:schemaLocation attribute which points to a schema for that namespace which includes the schema from the previous version. There are at least two caveats to this approach (i) the client has to trust the server since it is using a schema defined by the server not the client's and (ii) since the xsi:schemaLocation attribute is only a hint it is likely the validator may ignore it since the client would already have provided a schema for that namespace.

Change the namespace name or element name

The author simply updates the schema with the new type. A sample is:

This does not allow extension without changing the schema, and thus requires that both sides simultaneously update their schemas. If a receiver has only the old schema and receives an instance with middle, this will not be valid under the old schema

Most people would state that this isn't really extensibility since [to XML namespace aware technologies and APIs] the names of all elements in the vocabulary have changed. However for applications that key off the local-name of the element or are unsavvy about XML namespaces this is a valid approach that doesn't cause breakage. Ignoring namespaces, this approach is simply adding more stuff in a later revision of the spec which is generally how XML vocabularies evolve in practice.

Use wildcard with ##other

This is a very common technique. A sample is:

The problems with this approach are summarized in Examining elements and wildcards as siblings. A summary of the problem is that the namespace author cannot extend their schema with extensions and correctly validate them because a wildcard cannot be constrained to exclude some extensions.

I'm not sure I agree with David Orchard summary of the problem here. The problem described in the article he linked to is that a schema author cannot refine the schema in subsequent versions to contain optional elements and still preserve the wildcard. This is due to the Unique Particle Attribution constraint which states that a validator MUST always have only one choice of which schema particle it validates an element against. Given an element declaration for an element and a wildcard in sequuence the schema validator has a CHOICE of two particles it could validate an element against if its name matches that of the element declaration. There are a number of disambiguating rules the W3C XML Schema working group could have come up with to allow greater flexibility for this specific case such as (i) using a first match rule or (ii) allowing exclusions in wildcards.

Use wildcard with ##any or ##targetnamespace

This is not possible with optional elements. This is not possible due to XML Schema's Unique Particle Attribution rule and the rationale is described in the Versioning XML Languages article. An invalid schema sample is:

The Unique Particle Attribution rule does not allow a wildcard adjacent to optional elements or before elements in the same namespace.

Agreed. This is invalid.

Extension elements

This is the solution proposed in the versioning article. A sample of the pre-extended schema is:

An extended instance is

This is the only solution that allows backwards and forwards compatibility, and correct validation using the original or the extended schema. This articles shows a number of the difficulties remaining, particularly the cumbersome syntax and the potential for some documents to be inappropriately valid. This solution also has the problem of each subsequent version will increase the nesting by 1 level. Personally, I think that the difficulties, including potentially deep nesting levels, are not major compared to the ability to do backwards and forwards compatible evolution with validation.

The primary problem I have with this approach is that it is a very unidiomatic way to process XML especially when combined with the problem with nesting in concurrent versions. For example, take a look at

Imagine if this is the versioning strategy that had been used with HTML, RSS or DocBook. That gets real ugly, real fast. Unfortunately this is probably the best you can if you want to use W3C XML Schema to strictly define the an XML vocabulary with extensibility yet allow backwards & forwards compatibility.

David Orchard goes on to suggest a number of potential additions to future versions of W3C XML Schema which would make it easier to use it in defining extensible XML vocabularies. However given that my personal opinion is that adding features to W3C XML Schema is not only trying to put lipstick on a pig but also trying to build a castle on a foundation of sand, I won't go over each of his suggestions. My recent suggestion to some schema authors at Microsoft about solving this problem is that they should have two validation phases in their architecture. The first phase does validation according to W3C XML Schema rules while the other performs validation of “business rules“ specific to their scenarios. Most non-trivial vocabularies end up having such an architecture anyway since there are a number of document validation capabilities missing from W3C XML Schema so schema authors shouldn't be too focused on trying to force fit their vocabulary into the various quirks of W3C XML Schema.

For example, in one could solve the original schema with a type definition such as

<xsd:complexType name="nameType">
<xsd:choice minOccurs="1" maxOccurs="unbounded">
   <xsd:element name="first" type="xsd:string" />
   <xsd:element name="last" type="xsd:string" minOccurs="0"/>
   <xsd:any namespace="##other" processContents="lax" />
</xsd:choice>
</xsd:complexType>

where the validation layer above the W3C XML Schema layer ensures that an element doesn't occur twice (i.e. there can't be two <first> elements in a <name>). It adds more code to the clients & servers but it doesn't result in butchering the vocabulary either.

Categories: XML

January 19, 2004

@ 06:57 AM

Comments [4]

Why I Read "Get Your War On" Weekly

You know what we should do? Send up a Mars mission and once they're up in space, call them and say, "You guys can't reenter the atmosphere until you develop a cure for AIDS. Get crackin;

C'mon I bet if you asked people in Africa if they wanted us to go to Mars, they'd say yes--because it's important for humanity to reach ever upward. It's inspirational. We're at our best when we dare to dream a...GAAAH! I just puked in my space helmet.

I read somewhere that the cost of going to Mars may eventually total up to $170 billion which is nowhere close to the $12 billion the US President has stated will flow into NASA's coffers over the next 5 years to help finance the Mars dream. I don't want to knock the US government's spending on AIDS (supposedly $1 billion this year) but aren't there significant, higher priority problems on Earth that need tackling before one starts dabbling in interplanterary conquest?

Gil-Scott Heron's poem Whitey's on the Moon is still quite relevant today. I guess the more things change, the more they stay the same.

Categories: Mindless Link Propagation

January 19, 2004

@ 06:30 AM

Comments [1]

It's All About Your Point of "View"

In a recent post entitled XML For You and Me, Your Mama and Your Cousin Too I wrote

The main problem is that there are a number of websites which have the same information but do not provide a uniform way to access this information and when access mechanisms to information are provided do not allow ad-hoc queries. So the first thing that is needed is a shared view (or schema) of what this information looks like which is the shared information model Adam talks about...

Once an XML representation of the relevant information users are interested has been designed (i.e. the XML schema for books, reviews and wishlists that could be exposed by sites like Amazon or Barnes & Nobles) the next technical problem to be solved is uniform access mechanisms... Then there's deployment, adoption and evangelism...

We still need a way to process the data exposed by these web services in arbitrary ways. How does one express a query such as "Find all the CDs released between 1990 and 1999 that Dare Obasanjo rated higher than 3 stars"?..

At this point if you are like me you might suspect that defining that the web service endpoints return the results of performing canned queries which can then be post processed by the client may be more practical then expecting to be able to ship arbitrary SQL/XML, XQuery or XPath queries to web service end points.

The main problem with what I've described is that it takes a lot of effort. Coming up with standardized schema(s) and distributed computing architecture for a particular industry then driving adoption is hard even when there's lots of cooperation let alone in highly competitive markets.

A few days ago I got a response to this post from Michael Brundage, author of XQuery : The XML Query Language and a lead developer of the XML<->relational database technologies the WebData XML team at Microsoft produces, on a possible solution to this problem that doesn't require lots of disparate parties to agree on schemas, data model or web service endpoints. Michael wrote

Dare, there's already a solution to this (which Adam created at MS five years ago) -- virtual XML views to unify different data sources. So Amazon and BN and every other bookseller comes up with their own XML format. Somebody else comes along and creates a universal "bookstore" schema and maps each of them to it using an XML view. No loss of performance in smart XML Query implementations.

And if that universal schema becomes widely adopted, then eventually all the booksellers adopt it and the virtual XML views can go away. I think eventually you'll get this for documents, where instead of translating WordML to XHTML (as Don is doing), you create a universal document schema and map both WordML and XHTML into it. (And if the mappings are reversible, then you get your translators for free.)

This is basically putting an XML Web Service front end that supports some degree of XML query on aggegator sites such as AddALL or MySimon. I agree with Michael that this would be a more bootstrapable approach to the problem than trying to get a large number of sites to support a unified data model, query interface and web service architecture.

Come to think of it we're already halfway there to creating something similar for querying information in RSS feeds thanks to sites such as Feedster and Technorati. All that is left is for either site or others like them to provide richer APIs for querying and one would have the equivalent of an XML View of the blogosphere (God, that is such a pretensious word) which you could query to your heart's delight.

Interesting...

Categories: XML

January 19, 2004

@ 06:10 AM

Comments [2]

Tomorrow is one of Those Days

I just checked my Outlook Calendar for tomorrow and it looks like I have about six hours of meetings.

Suckage...

Categories: Life in the B0rg Cube

January 18, 2004

@ 09:02 PM

Comments [0]

Posting to a dasBlog weblog from w::bloggar

For my last post via w::bloggar I used the configuration settings described in the dasBlog documentation on the Blogger API. Weirdly enough, my post showed up without a title. Torsten Rendelmann comes to the rescue with the alternate instructions for posting to your dasBlog weblog with w::bloggar that supports titles and categories. Hope this works.

Categories: Das Blog

January 18, 2004

@ 08:50 PM

Comments [4]

Posting To Your Blog From RSS Bandit

I'm in the process of tweaking the RSS Bandit installer and implementing a stop gap measure for supporting posting to one's blog from RSS Bandit while waiting for a SOAP version of the ATOM API. The next release of RSS Bandit will ship with a plugin for posting about a particular blog entry to your via w.bloggar plugin for SharpReader

RSS Bandit already supports Luke's plugin but most people don't know about it so I decided to implement a similar plugin [as opposed to redistributing Luke's since I didn't see anything in the license allowing free redistribution] and add during the install process. PS: This is my first post from w::bloggar. Hope it works.

Categories: RSS Bandit

January 16, 2004

@ 02:37 AM

Comments [2]

How Low Can You Go?

From Yahoo! News we learn Outsourcing Contributes To IT Salaries' Downward Spiral

Overall, the premium paid for IT workers with specific skills was 23 percent lower in 2003 than in 2001, and the pay for certification in particular skills dropped 11 percent, Foote Partners LLC said.
...
In a yearlong study of 400 Fortune 1000 companies, researchers found that by 2006, the organizations expected from 35 percent to 45 percent of their current full-time IT jobs to go to workers overseas, David Foote, president and chief research officer for Foote Partners, said.

"That showed a definite declining onshore workforce--fewer jobs for IT people in this country," Foote said.

Perhaps it is time to go back to school and get started on my backup plan of being lawyer specializing in intellectual property law.

Categories: Ramblings

January 15, 2004

@ 08:24 AM

Comments [7]

An Industry First

A Newsgator press release from last week reads

Subscription Synchronization

Users who subscribe to NewsGator Online Services can now synchronize their subscriptions across multiple machines. This is an industry first - NewsGator 2.0 for Outlook and NewsGator Online Services are the first commercially available tools to provide this capability in such a flexible manner. This sophisticated system ensures that subscriptions follow users wherever they go, users never have to read the same content twice (unless they choose to), and even supports multiple subscription lists so users can have separate, but overlapping, subscription lists at home and at the office.

Interesting. Synchronizing subscriptions for a news reader across multiple machines doesn't strike me as unprecedented functionality that Newsgator pioneered let alone an industry first. The first pass I've seen at doing this in public was Dave Winer's subscription harmonizer which seemed more of a prototype than an actual product expected to be used by regular users. I implemented and shipped the ability to synchronize subscriptions across multiple machines with RSS Bandit about 2 months ago. As for providing an aggregator that supports this feature and a commercial site that would host feeds synchronization information I believe Shrook has Newsgator beat by about a month if the website is to be believed (I don't have a Mac to test whether it actually works as advertised).

I find it unfortunate that it seems that we are headed for a world where multiple proprietary solutions and non-interoperable solutions exist for providing basic functionality that uses take for granted when it comes to other technologies like email. This was the impetus for starting work on Synchronization of Information Aggregators using Markup (SIAM) . Unfortunately between my day job, the girlfriend and trying to get another release of RSS Bandit out the door I haven't had time to brush up the spec and actually work on an implementation. It'll be a few weeks before I can truly focus on SIAM, hopefully it'll be worth waiting for and it'll gain some traction amongst aggregator developers.

Categories: RSS Bandit | XML

January 15, 2004

@ 07:47 AM

Comments [3]

The Peril of Foreign Language Tattoos

Just saw the following headline at SoufOaklin.com Disgruntled Asian Tattoo Artist Inks His Revenge

Pitt junior Brandon Smith wanted a tattoo that proclaimed his manliness, so he decided to get the Chinese characters for “strength” and “honor” on his chest. After 20 minutes under the needle of local tattoo artist Andy Sakai, he emerged with the symbol for “small penis” embedded in his flesh.

“I had it for months before I knew what it really meant,” Smith said. “Then I went jogging through the Carnegie Mellon campus and a group of Asian kids started laughing and calling me ‘Shorty.’ That’s when I knew something was up.”

Sakai, an award-winning tattoo artist, was tired of seeing sacred Japanese words, symbols of his heritage, inked on random white people. So he used their blissful ignorance to make an everlasting statement. Any time acustomer came to Sakai’s home studio wanting Japanese tattooed on them, he modified it into a profane word or phrase.

“All these preppy sorority girls and suburban rich boys think they’re so cool ‘cause they have a tattoo with Japanese characters. But it doesn’t mean shit to them!” Sakai said. “The dumbasses don’t even realize that I’ve written ‘slut’ or ‘pervert’ on their skin!”

I'm surprised that reports of actions like this are not more widespread. I keep waiting for someone to start the Japanese version of Engrish.com that makes fun of all the folks in the USA who have misspelled Japanese characters on their T-shirts or tattoed on their skin the same way Engrish.com does for misspelled, grammatically incorrect English that shows up in Japan all over the place.

I've always thought it was really ghetto (i.e. ignorant) to have characters in a language you can't freaking understand tatooed on your skin. Anyone who's ever done this needs to be awarded 100 ghettofabulous points when they pass Go! and should also collect a free copy of Kardinall Offishall's UR Ghetto. Dang!

Categories: Ramblings

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Tuesday, 20 January 2004 - Dare Obasanjo's weblog

Type extension

Change the namespace name or element name

Use wildcard with ##other

Use wildcard with ##any or ##targetnamespace

Extension elements