I've written the first draft of the specification for the "feed" URI scheme. From the abstract

This document specifies the "feed" URI (Uniform Resource Identifier) scheme for identifying data feeds used for syndicating news or other content from an information source such as a weblog or news website. In practice, such data feeds will most likely be XML documents containing a series of news items representing updated information from a particular news source.

The purpose of this scheme is to enable one click subscription to syndication feeds in a straightforward, easy to implement and cross platform manner. Support for one click subscription using the "feed" URI scheme is currently supported by NetNewsWire, Shrook, SharpReader and RSS Bandit. The author of NewsGator has indicated that support for one click subscription using the "feed" URI scheme will exist in next version.

Any feedback on the draft specification would be appreciated.

Update: Graham Parks has pointed out in the comments to this post that URIs of the form "feed://http://www.example.com/rss.xml" are not compliant with RFC 2396. This will be folded into the next draft of the spec.


 

Categories: Technology

Chris Sells recently complained that a recent interview of Don Box by  Mary Jo Foley is "a relatively boring interview" because "Mary Jo doesn't dig for any dirt and Don doesn't volunteer any". He's decided to fix this by proposing an alternate interview where folks send in their favorite questions and he picks the 10 best and formwards them to Don (kinda like Slashdot interviews). Chris offers some seed questions but they are actually much lamer than any of the ones Mary Jo asked so I suspect his idea of questions that dig for dirt are different from mine.

I drafted 10 questions and picked the 3 least controversial for my submissions to the Don Box interview pool.

  1. People often come up with euphemisms for an existing word or phrase that has become "unpleasant" which although technically mean a different thing from the previous terminology are used interchangeably. A recent example of this is the replacement of "black" with "African American" in the modern American lexicon when describing people of African descent.

    I suspect something similar has happened with XML Web Services and Service Oriented Architecture. Many seem to think that the phrases are interchangeable when on the surface it seems the former is just one instance of the latter. To you what is the difference between XML Web Services and Service Oriented Architectures?

  2. For a short while you were active in the world of weblogging technologies, you tried to come up with an RSS profile and were working on a blogging tool with Yasser Shohoud and Martin Gudgin. In recent times, you have been silent about these past activities. What sparked your interest in weblogging technologies and why does that interest seem to have waned?

  3. What team would you not want to work for at Microsoft and why?

These were my tame questions but I get to hang with Don sometime this week so I'll ask him some of the others in person. I hope one of my questions gets picked by Chris Sells.


 

Categories: Life in the B0rg Cube | XML

Where else do you get to see movie clips of illustrious American celebrities in ads for household products they wouldn't be caught doing in the United States? Japander.com, of course. The front page of the website reads

Pander:n., & v.t. 1. go-between in clandestine amours, procurer; one who ministers to evil designs. 2 v.i. minister (to base passions or evil designs, or person having these)

Japander:n.,& v.t. 1. a western star who uses his or her fame to make large sums of money in a short time by advertising products in Japan that they would probably never use. ~er (see synecure, prostitute) 2. to make an ass of oneself in Japanese media.

The clips are all crazy weird from Arnold Schwarznegger pimping energy drinks and cup o' noodles to Mel Gibson, Antonio Banderas  & Kevin Costner as Subaru pitchmen. I probably spent 30 minutes marvelling at the ads on the site, I definitely never thought I'd ever see Harrison Ford doing beer commercials. Definitely entertaining stuff.  


 

Just stumbled on the following article entitled So, Scrooge was right after all

Conventional economics teaches that gift giving is irrational. The satisfaction or "utility" a person derives from consumption is determined by their personal preferences. But no one understands your preferences as well as you do.

So when I give up $50 worth of utility to buy a present for you, the chances are high that you'll value it at less than $50. If so, there's been a mutual loss of utility. The transaction has been inefficient and "welfare reducing", thus making it irrational. As an economist would put it, "unless a gift that costs the giver p dollars exactly matches the way in which the recipient would have spent the p dollars, the gift is suboptimal".

The big problem I've always had with economics as I was always taught in school is that the fundamental assumption underlying it is that humans make rational decisions when buying and selling goods and services. This is simply not true. The above example is a good one; it makes more sense for everyone involved in the annual gift exchange that is Christmas if people just gave checks and gift certificates instead of buying gifts that the recipients don't want or don't need. Yet this isn't how Christmas gift giving is done in most cases. Then there's the entire field of advertising with its concept of lifestyle ads which are highly successful and are yet another example that human buying decisions aren't steeped in rationality.

What a crock...


 

December 27, 2003
@ 09:55 PM

An article in the Economist lets us know that research has confirmed that men lose their fiscal prudence in the presence of attractive women

Over 200 young men and women participated in the study, which was divided into three parts. In the first, the participants were asked to respond to nine specific choices regarding potentially real monetary rewards. (At the end of the session, they could roll dice to try to win one of their choices, which would be paid by an appropriately post-dated cheque issued by the university.) In each case, a low sum to be paid out the next day was offered against a higher sum to be paid at a specified future date. Individual responses were surprisingly consistent, according to Dr Wilson, so the “pre-experiment” threshold of each participant was easy to establish.

The volunteers were then asked to score one of four sets of pictures for their appeal: 12 attractive members of the opposite sex; 12 non-lookers; 12 beautiful cars; or 12 unimpressive cars. Immediately after they had seen these images, they were given a new round of monetary reward choices.

As predicted, men who had seen pictures of pretty women discounted the future more steeply than they had done before—in other words, they were more likely to take the lesser sum tomorrow. As Dr Wilson puts it, it was as though a special “I-want-that-now” pathway had been activated in their brains. After all, the money might come in handy immediately. No one else was much affected. (Women did seem to be revved up by nice cars, a result the researchers still find mystifying. But the statistical significance of this finding disappeared after some routine adjustments, and in any case previous work has suggested that women are more susceptible to displays of wealth than men are.)

I guess this explains Abercrombie & Fitch's "alleged" hiring practices. It's always interesting to see stuff you've long taken for granted backed up by research especially observing how the experiments are confucted.


 

December 27, 2003
@ 07:37 PM

Slashdot has posted a link to Eric Sink's "Make More Mistakes" article on MSDN. One of the anecdotes from the article reads as follows

Circa 1998, betting on Java for a graphical user interface (GUI) application was suicidal. The platform simply didn't have the maturity necessary for building quality user interfaces. I chose Java because I was "head over heels" in love with it. I adored the concept of a cross-platform C-like language with garbage collection. We were hoping to build our Java expertise and make this exciting new technology our specialty.

But Java turned out to be a terrible frustration. The ScrollPane widget did a lousy job of scrolling. Printing support routinely crashed. The memory usage was unbelievably high.

I should have gotten over my religious devotion to semicolons and done this app in Visual Basic.

Lesson learned: Be careful about using bleeding-edge technologies.

There are some on Slashdot who think that Eric learned the wrong lesson from that experience. This post entitled Alteration of rule is from a developer who was in a similar circumstance as Eric but had a different outcome

I built a Java/Swing app around the same time. It was a pretty complex user app, not just a simple program - and we managed to completely satisfy the clients and make the program perform acceptably on a very low-end target platform (PII-133 with 32 MB of memory). For what he described (replacing a complex spreadsheet) he should have been able to complete the task.

Why did our app work and his fail? Because we knew Java and Swing well by that point, and knew what was possible with some time spent optimizing. We had a plan in our head for how to reach a target level of performance that would be accepted and more than met that goal.

The lesson he should have learned was "Know your technology well before you embark on a project". The reason why it's so important to learn THAT lesson is that it applies to any project, not just ones using "bleeding edge" technologies. The only difference between an established and bleeding edge technology is the level of support you MIGHT be able to find. And that is not enough of a difference to totally affect either failure or success.

I tend to agree with the Slashdot post. Learning on the job is fine and all but when big bucks is on the line its best to go with what you are familar with especially if it is tried and tested.


 

Categories: Ramblings

December 27, 2003
@ 07:29 PM

I was talking to some friends of mine over Christmas who happen to still be in college and I learned about a drinking game they played in their sorority house that actually requires sports equipment and hand eye coordination. I am speaking of Beer Pong, the description of Beer Pong at Barmeistrer.com reads 

The materials you need for this game are some cups of beer, a ping pong ball, and a table. The game is best played with either two people or two teams of two.

Arrange the cups of beer on either side of the table like you are setting up bowling pins. You should have at least six cups on both sides. Each team takes a turn by trying to get the ping pong ball into the other team's cups. If they succeed the other teams must drink that cup. The cup is then removed and the rest of the cups are rearranged so that they are close to each other. Each team alternates turns like this.

When all the cups on one side have been cleared the team that cleared them wins and the other teams must finish any cups remaining on the winning team's side

My friends and I would have loved this game. I wonder what else I missed out on by going to a geeky technical college and shunning the few social activities that occured on campus.
 

Categories: Ramblings

December 26, 2003
@ 04:07 PM

Mark Pilgrim's most recent entry in his RSS feed contains the following text

The best things in life are not things. (11 words)

Note: The "dive into mark" feed you are currently subscribed to is deprecated. If your aggregator supports it, you should upgrade to my Atom feed, which includes both summaries and full content.

A lot of the ATOM vs. RSS discussion has been mired in childishness and personality conflicts with the main proponents of ATOM claiming that the creation of the ATOM syndication format will be a good thing for users of syndication and blogging software. Now let's pretend this is true and the only people who have to bear the burden are aggregator authors like me who now have to add support for yet another syndication format. Let's see what my users get out of ATOM feeds compared to RSS feeds.

  1. Mark Pilgrim's ATOM feed: As I write this his feed contains the following elements per entry;  id, created, issued, modifed, link, summary,title, dc:subject and content. The aformentioned elements are equivalent to guid, pubDate, issued, modified, link, description, title, dc:subject and content:encoded/xhtml:body that exist in RSS feeds today. In fact an RSS feed with those elements and Mark Pilgrim's feed will be treated identically by RSS Bandit. The only problematic pieces are that his feed contains three dates that express when the entry was issued, when it was modified and when it was created. Most puzzling is that the issued date is before its created date. I have no idea what this distinction means and quite frankly I doubt many people will care.

    Basically, it looks like Mark Pilgrim's ATOM feed doesn't give users anything they couldn't get from an equivalent RSS feed except the fact that they have to upgrade their news aggregators and deal with potential bugs in the implementations of these features [because there are always bugs in new features]
  2. LiveJournal's ATOM feeds: As I write this a sample feed from Live Journal (in this case Jamie Zawinski's) contains the following elements per entry;  id, modified, issued, link, titleauthor and content . The aformentioned elements are equivalent to guid, modified, issued, link, titleauthor/dc:author and content:encoded/xhtml:body. Comparing this feed to Mark Pilgrim's I already see a bunch of ambiguity which supposedly is not supposed to exist since what ATOM supposedly gives consumers over RSS is that it will be better defined and less ambiguous than RSS. How are news aggregators supposed to treat the three date types defined in ATOM? In RSS I could always use the pubDate or dc:date now I have to figure out which of <modified>, <issued> or <created> is the most relevant one to show the user. Another point is what do I do if a feed contains <content rel="fragment"> amd a <summary>? Which one do I show the user?
  3. Movable Type's ATOM feeds: As I write this the MovableType ATOM template contains the following elements; id, modified, issued, link, titleauthor, dc:subject. summary and content. The aformentioned elements are equivalent to guid, modified, issued, link, titleauthor/dc:author, dc:subject, description and content:encoded/xhtml:body. Again besides the weirdness with dates (and I suspect RSS Bandit will end up treating <modifed> equivalent to <pubDate>) there isn't anything users get from the ATOM feed that they don't get from the equivalent RSS feed. Interesting, I'd expected that I'd find at least one of the first 3 sample ATOM feeds that I took a look at would show me why it was worth it that I spend a weekend or more implementing ATOM support in RSS Bandit. 

The fundamental conceit of the ATOM effort is that they think writing specifications is easy. Many of its proponents deride RSS for being ambiguous and not well defined yet they are producing a more complex specification with more significant ambiguities in it than I've seen in RSS. I actually have a mental list of significant issues with ATOM that I haven't even posted yet, the ones I mentioned above were just from glancing at the aforementioned feeds. My day job involves reading or writing specs all day. Most of the specs I read either were produced by the W3C or by folks within Microsoft. Every one of them contains contradictions, ambiguities and lack crucial information for determining in edge cases. Some are better than others but they all are never well-defined enough. Every spec has errata.

The ATOM people seem to think that if a simple spec like RSS can have ambiguities they can fix it with a more complex spec, which anyone who actually does this stuff for a living will tell you just leads to more complex ambiguities to deal with not less.

I wish them luck. As I implement their spec I at least hope that some of these ATOM supporters get a clue and actually use some of the features of ATOM that RSS users have enjoyed for a while and are lacking in all of the feeds I linked to above such as the ATOM equivalent to wfw:commentRss. It's quite irritating to be able to read the comments to any .TEXT or dasBlog weblog in my news aggregator but then have to navigate to the website when I'm reading a Movable Type or LiveJournal feed to see the comments.  


 

Categories: XML

In a post entitled A Plea to Microsoft Architects, Michael Earls writes

This post in in response to a post by Harry Pierson over at DevHawk...

It is abundantly frustrating to be keeping up with you guys right now.  We out here in the real world do not use Longhorn, do not have access to Longhorn (not in a way we can trust for production), and we cannot even begin to test out these great new technologies until version 1.0 (or 2.0 for those that wish to stay sane)...My job is to work on the architecture team as well as implement solutions for a large-scale commercial website using .NET.  I use this stuff all day every day, but I use the  1.1 release bits.

Here's my point, enough with the "this Whidbey, Longhorn, XAML is so cool you should stop whatever it is you are doing and use it".  Small problem, we can't.  Please help us by remembering that we're still using the release bits, not the latest technology... Oh yeah, we need more samples of current bits and less of XAML.

Remember, we're your customers and we love this new technology, but we need more of you to focus CURRENT topics on CURRENT RELEASE bits.  I don't want to read about how you used XAML and SOA to write a new version of the RSS wheel.  The RSS I have now is fine (short of the namespace that Harry mentions).  Leave it alone.

The only folks at Microsoft with Architect in their job title that blog I can think of are Don, Chris Anderson and Chris Brumme so I assume Michael is complaining about one or more of these three although they may be other software architect bloggers at Microsoft that I am unaware of. The first point I'd note is that most people that blog at Microsoft do so without any official direction so they blog about what interests them and what they are working on not what MSDN, PSS or our documentation folks thinks we need more public documentation and guidance around. That said, architects at Microsoft usually work on next generation technologies since their job is to guide and supervise their design so it is to be expected that when they blog about what they are working on it will be about next generation stuff. The people who work on current technologies and are most knowledgeable about them are the Program Managers, Developers and Testers responsible for the technology not the architects that oversee and advise their design.

My advice to Michael would be that he should broaden his blog horizons and consider reading some of the other hundreds of Microsoft bloggers many of whom blog about current technologies instead of focusing on those folks who are designing stuff that'll be shipping in two or more years and complaining when they blog about said technologies.

This isn't to say I disagree with Michael's feedback and in fact being a firm believer in Joel Spolsky's  Mouth Wide Shut principle I agree with most of it  (except for the weird bit about the fact that blogging about next generation stuff increases the perception that Microsoft is a monopoly). However he and others like him should remember that most of us blogging are just talking about we're working on not trying to give people "version envy" because we get to run the next version of the .NET Framework or Windows years before they ship.  

I have no idea how Chris Anderson, Don Box and other Microsoft architect bloggers will react to Michael's feedback but I hope they take some of it to heart.

[Update: Just noticed another Microsoft blogger with "architect" in his job title, Herb Sutter. Unsurprisingly he also blogs about the next release of the product he works on not current technology.]
 

Categories: Life in the B0rg Cube

December 24, 2003
@ 05:09 AM

Joshua Allen writes

 Before discussing qnames in content, let's discuss a general issue with qnames that you might not have known about.  Take the following XML:

<?xml version="1.0" ?>
<root xmlns:p="http://foo.org">
  <p:elem att1="" att2="" ... />
  <p:elem att1="" att2="" ... xmlns:p="
http://bar.org" / >
  <x:elem att1="" att2="" xmlns:x="
http://foo.org" />
</root>

Notice the first two elements, both ostensibly named "p:elem", but if we treat the element names as opaque strings, we'll get confused and think the elements are the same.  Luckily, we have this magical thing called a qname that uses namespace instead of prefix, and so we can note that the two element names are actually "{http://bar.org}elem" and "{http://foo.org}/elem" -- different.  By the same token, if we compare the first and third element using opaque strings, we think that they are different ("p:elem" and "x:elem").  But if we look at the qnames, we see they are both "{http://foo.org}elem".
...
so what is the big deal for qnames in content?  Look at the following XML:

<?xml version="1.0" ?>
<root xmlns:x="urn:x" xmlns:p="http://www.foo.org" >
  <p:elem>here is some data: with a colon for no good reason</p:elem>
  <p:elem>x:address</p:elem>
  <p:elem xmlns:x="urn:y">x:address</p:elem>
</root>

Now, do the last two "p:elem" elements contain the same text, or different text?  If you compared using XSLT or XPath, what would be the result?  How about if you used the values in XSD key/keyref?  The answer is that XSLT and XPath have no way of knowing that you intend those last two elements to be qnames, so they will treat them as opaque strings.  With XSD, you could type the node as qname... Most APIs are smart enough to inject namespace declarations if necessary, so the first node would write correctly as:

<p:elem xmlns:p="http://www.foo.org">here is some data: with a colon for no good reason</p:elem>

But, since the DOM has no idea that you stuffed a qname in the element content, it's got no way to know that you want to preserve the namespace for x:

<p:elem xmlns:p="http://www.foo.org">x:address</p:elem>

There is really only one way to get around this, and this is for any API which writes XML to always emit namespace declarations for all namespaces in scope, whether they are used or not (or else understand enough about the XSD and make some guesses).  Some APIs do this, but it is not something that all APIs can be trusted to do, and it yields horribly cluttered XML output and other problems.

Joshua has only hit the surface of what the real problem which is that there is no standard way to write out an XML infoset with the PSVI contributions added during validation. In plain English, there is no standard way to write out an XML document that has been validated using W3C XML Schema containing all the relevant type annotations plus other infoset augmentations. In the above example, the fact that the namespace declaration that uses the "x" prefix is not included in the output is not as significant as the fact that there is no way to tell that the type of p:elem's content is the xs:QName type.

However this doesn't change the fact that using QNames in content in an XML vocabulary is a bad idea. Specifically I am talking about using the xs:QName type in your vocabulary.  The semantics of this type are so absurd it boggles the mind. Below is the definition from the W3C XML Schema recommendation

[Definition:]   QName represents XML qualified names. The ·value space· of QName is the set of tuples {namespace name, local part}, where namespace name is an anyURI and local part is an NCName. The ·lexical space· of QName is the set of strings that ·match· the QName production of [Namespaces in XML].

This basically says that text content of type xs:QName in an XML document such as "x:address" actually is a namespace name/local name pair such as  "{http://www.example.com}address". This instantly means that you can not interpret this type without carrying around some sort of context (i.e a list of namespace name<->prefix bindings) which makes it different from most other types defined in the W3C XML Schema recommendation because it has no canonical lexical representation. A value such as  "x:address" is meaningless without knowing what XML document it came from and specifically what the namespace binding for the "x" prefix was at that particular scope.  

Of course, the existence of the QName type means you can do interesting things like use a different prefix for a particular namespace in the schema than you use in the XML instance so you can specify that the content of the <p:elem> element should be one of a:address or a:location but have x:address in the instance which would be fine if the "a" prefix is bound to the "http://www.example.com" namespace in the schema and the "x" is bound to the same namespace in the instance document. You can also ask interesting questions such as What happens if I have a default value that is of type xs:QName but there is no namespace declaration for the namespace name at that scope? Does this mean that not only should a default value be inserted as the content of an element or attribute but also that a namespace declaration is also created at the same scope if one does not exist?

Fun stuff, not.


 

Categories: XML