The W3C xml:base recommendation describes the attribute xml:base when appearing on an XML element allows one to specify a base URI for the element and its children other than the base URI of the document or external entity. The base URI of a document or entity is the URI from which the document or entity was loaded. For example, the base URI of my RSS feed is http://www.25hoursaday.com/weblog/SyndicationService.asmx/GetRss. The following example taken from the W3C recommendation shows how xml:base processing works.

<?xml version="1.0"?>
<doc xml:base="http://example.org/today/"
     xmlns:xlink="http://www.w3.org/1999/xlink">
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <paragraph>See <link xlink:type="simple" xlink:href="new.xml">what's
      new</link>!</paragraph>
    <paragraph>Check out the hot picks of the day!</paragraph>
    <olist xml:base="/hotpicks/">
      <item>
        <link xlink:type="simple" xlink:href="pick1.xml">Hot Pick #1</link>
      </item>
      <item>
        <link xlink:type="simple" xlink:href="pick2.xml">Hot Pick #2</link>
      </item>
      <item>
        <link xlink:type="simple" xlink:href="pick3.xml">Hot Pick #3</link>
      </item>
    </olist>
  </body>
</doc>

The URIs in the xlink:href attributes in this example resolve to full URIs as follows:

  • "what's new" resolves to the URI "http://example.org/today/new.xml"

  • "Hot Pick #1" resolves to the URI "http://example.org/hotpicks/pick1.xml"

  • "Hot Pick #2" resolves to the URI "http://example.org/hotpicks/pick2.xml"

  • "Hot Pick #3" resolves to the URI "http://example.org/hotpicks/pick3.xml"

xml:base exists as a mechanism to mimic HTML's BASE element and bring that functionality to the XML world. This was supposed to be a companion technology to XLink which was supposed to be a generic way to describe links in XML documents. Both XLink and xml:base were expected to be used in XHTML 2.0. However the XHTML working group rejected them and instead proposed HLink which was rejected by the W3C Technical Architecture Group. A lot of this is covered in the XML.com articles Introducing HLink and TAG Rejects HLink by Kendall Clark.

Even though xml:base has been rejected by the designers of the technologies it was primarily intended to be used with it has still made its way into the core of the XML family of technologies. Specifically, xml:base is used by the XML Infoset recommendation to define base URIs. This elevated xml:base and HTML-style base URI processing from being an application-specific construct to being a core part of XML that should be supported by XML parsers. For example, XQuery and XPath 2.0 will have the base-uri() function which returns the base URI of a node and takes into account the xml:base attribute.

The next question is whether the .NET Framework supports the xml:base recommendation. At first glance it looks this way since there is BaseURI property on both the XmlNode and XmlReader classes. However these properties report the BaseURI in the classic sense only (i.e. where the node was loaded from which is either the URI of the document or the URI of the entity it was expanded from). We were planning to add support for xml:base to the core XML parser as part of implementing XInclude but given that that it recently went from being a W3C candidate recommendation to going back to being a W3C working draft (partly due to a number of the architectural issues raised by Murata Makoto) the future of the spec is currently uncertain so we've backed off on our implementation. In the meantime, developers can use XInclude.NET if they need XML Inclusions and its associated support for the xml:base attribute in the .NET Framework.  
 

Categories: XML

Daniel Cazzulino writes in response to Don Demsak's post on Waking Up From A DOM Induced Coma

So, in this regard, I believe SUN is doing a good job at concentrating on pluggable and standard interfaces and specifications, and letting whoever wants to take the time to implement custom stuff.
I don't want to "new XmlTextReader". I want some app/system-wide factory take care of creating the appropriate parser implementation for me based on declarative configuration, and I want my to code to work against a single unified interface/base class always.
Changing the parser shouldn't mean I have to change my working app code. If MS provides the appropriate abstractions, it wouldn't even be necessary to rely on some implementation-specific feature such as XmlTextReader.GetRemainder that is not part of the abstract contract defined by XmlReader.

I both agree and disagree with Daniel. We do have a single unified interface for processing XML which developers can program against, it is called the XmlReader. Unfortunately, we subclassed this class into the XmlTextReader and XmlValidatingReader which are actually what most developers program against including our devs internally. In the next version of the .NET Framework we are moving away from the XmlTextReader and XmlValidating reader. Instead we will emphasize programming directly to the XmlReader and will provide an implementation of the factory design patterns which returns different XmlReader instances based on which features the user is interested. More importantly users will be able to layer different XmlReader implementations on those created by our factory which was always our intention since v1.0 of the .NET Framework. For example, one could layer XSD Validation on top the XIncludingReader from XInclude.NET to combine third party XInclude support with Microsoft's W3C XML Schema validation technologies.

As for whether the Sun's approach of just providing interfaces instead of concrete for XML parsing was such a great thing in Java I'd claim that it's been hit and miss. Most XML developers from the Java world despise the DOM for the reasons described in Chapter 33 of Elliotte Rusty Harold's Effective XML. This is the reason for the existence of extensions and alternatives to the DOM API which extend it such as Oracle's XDK, dom4J, JDOM, Xerces and XOM. Heck, you can't even get the XML as a string out of node or save an XML document object to a file without using extensions since these aren't in the base DOM API. As for SAX, the API just gives you access to regular parsing events nothing fancy.  There isn't much difference functionally from programming against the base SAX APIs and programming against XmlReader

The one point of interest is that Daniel claims that the Java way of not shipping with any XML APIs but just interfaces is somehow better than the .NET way.  In Java one can programa against interfaces and loads the XML parser by passing the class name to a factory method. One could put this name in a config file and change it at runtime. The question is whether anyone in the .NET world actually thinks being able to change your XML parser implementation at runtime is anything more than a geek feature. I consider it as geeky as asking why you can't change the implementation of the System.String class to a user defined class that uses less memory at runtime without having to recompile. An interesting idea but one primarily of interest to the ultimate of power users.

The funny thing is that even if we shipped functionality where we looked in the registry or in some config file before figuring out what XML parser to load it's not as if there are an abundance of third party XML parsers targetting the .NET Framework in the first place. There is definitely no intention to ship any functionality like this in future versions of the .NET Framework.


 

Categories: XML

February 28, 2004
@ 05:20 AM

Dylan Greene was at Microsoft last week and talks about some observations about blogging and Microsoft in his post My meeting with the Scoblizer  

Some interesting things I picked up while at Microsoft:

  • None of my friends there blog.
  • None of them had heard of Scoble. (!)
  • None of them use RSS readers or read blogs with any frequency.
  • None of them seemed to understand the draw of blogging.

There are about 300 people blogging at Microsoft which sounds like a lot until you realize that at last count Microsoft had 55,000 employees. That means less than 1% of the employees at Microsoft are blogging. When you consider that it isn't that surprising that none of his friends blog when less than one in every hundred Microsoft employees blog or that they didn't know some random evangelist on the Windows team by name.

That said I do agree with Cameron Reilly that Microsoft is “still way ahead of the curve in terms of corporate blogging”.


 

Categories: Life in the B0rg Cube

February 27, 2004
@ 11:36 PM

I've been watching the online discussions about the proposed constitutional ammendment to ban gay marriage with bemusement. It is such a classic sleight of hand trick. If I was a sitting president who'd been discovered to have started a war that cost thousands of lives primarily to enrich my defence contractor buddies and had the opposition party's presidential candidates polling better than me I'd want to come up with a way to focus the public discourse away from these issues. Perhaps with controversial proposed legislation that would be a hot button topic but most likely wouldn't get passed anyway? Yeah, probably.  

It is unfortunate that such political games end up affecting people's lives and preventing the pursuit of happiness. At least it's not another phony war.


 

Categories: Ramblings

February 27, 2004
@ 08:20 PM

Given the fact that about 15 news aggregators currently support Mark Nottingham's Atom Syndication Format 0.3 (PRE-DRAFT) I'll be adding support for it to RSS Bandit this weekend. This won't be a big deal to implement relative to a number of other features Torsten and I have in mind. As Brent Simmons wrote

This experience was a reminder for me of how unimportant the underlying syndication formats are, in a way. What percent of time does an aggregator developer spend on RSS and Atom parsing code? 50%? 25%? 10%?

I figure it’s somewhere less than 1%.

The rest of the time is taken up with things like data storage, networking, and user interface. But mostly user interface. Not just implementing—which is often easy—but designing user interface, which is difficult.

In other RSS Bandit news Torsten is almost done with some code that fixes our #2 performance problem in RSS Bandit and Phil Haack has started work on official RSS Bandit documentation. Excellent work.

All of the above should show up in the next RSS Bandit release. Phil's documentation will most likely reside on the RSS Bandit Documentation Page on SourceForge and will be linked to from the RSS Bandit help menu.


 

Categories: RSS Bandit

February 26, 2004
@ 05:54 PM

Aaron Swartz has lots of interesting ideas about politics and copyright in the age of digital media. I disagree with a lot of his ideas on both but they are often well-thought and interesting. This month he continues his trend of interesting posts about politics with two entries Up is Down: How Stating the False Hides the True excerpted below

One of the more interesting Republican strategies is saying things whose opposite is true. They say that the Democratic nominee is bought off by special interests, the Democrats are outspending them, the Democrats are playing dirty, the Democrats don’t care about homeland security, the Democrats hate America, all when this is far more true of the Republicans. They say Joseph McCarthy was a noble man, the media has a liberal bias, affirmative action is bad for equality, Saddam had weapons of mass destruction, and Ronald Reagan was our greatest President, all when the opposite is far more true.

At first glance this seems bizarre — why draw attention to your weaknesses? But it’s actually a very clever use of the media. The media tries hard to be “fair and balanced”, and it generally believes the best way to do this is to present the opinions from both sides and make as few judgement calls as possible (to avoid introducing their own bias). And if there’s a debate on some issue, taking a side is seen as a judgement call.

and Down is Up: What This Stuff Is where he writes

I got a lot of responses to my previous post, Up is Down, along the lines of “oh, the Democrats lie as much as the Republicans”. But the piece was not about lies. For lack of a better term, it was about anti-truths. Anti-truths have two parts:

  1. They’re completely false.
  2. They’re more accurate when directly reversed.

It’s hard to find a completely unobjectionable one, but take “Ronald Reagan was our greatest President.” As for part one, I have seen no evidence that Reagan actually did anything particularly good on purpose and as for two, “Ronald Reagan was our worst President” seems to be a far more accurate statement, since he did lots of things that were quite bad.

My example of an anti-truth would have been “John Ashcroft respects the US constitution”. :)


 

February 26, 2004
@ 05:45 PM

In his post WinFS Scenario #2: event planning Jeremy Mazner writes

So as you can see, the information for any given event is spread all over the place, which means I can’t really keep track of it all.  If I want to see the invite list and RSVP status, I’m either in Outlook (if I’m lucky) or off to some file share to find the spreadsheet.  If I want to see some information about an invitee (their phone number, or who their account manager is), it’s off to the directory or CRM system to look it up.  If I want to know what presentations have been approved for use, I crawl through email to find the one message from my general manager where he says the presentation is ready.  If I want to see the actual presentation, it’s back to a file share or Sharepoint

What I really want is a way to corral all these related items together, stick them in a big bucket with a label that says what event they’re for.  I want a simple UI where I can see what events are coming up, then browse through all the related material for each one, and maybe be able to answer some simple questions: how many presentations for this event are in the Approved state?  How many attendees have declined the invitation?

 

I’ll assert that this is really, really hard to do today.  Outlook wizards would probably argue that you could do this with some series of catagories, public folders, and shared calendars.  SharePoint gurus would say this is exactly what a Meeting Workspace is for.  Old school event planners might claim you could track this all in a nice big spreadsheet with multiple pages and links out to file locations.

This is a valid problem that Jeremy brings up in his scenario and one I've thought about in the past when it comes to tying information from disparate applications about the same person. Outlook comes closes to doing what I'd like to see here but it gets help from Exchange. I was curious as to how Jeremy thought WinFS could help here and his thoughts were similar to what I'd first thought when I heard about WinFS, specifically he wrote

What does WinFS provide that will help?

  • A common storage engine, one unified namespace for storage of any application data on your machine.  Whether I use Outlook, AOL Communicator, or Notes, my emails can all be stored in WinFS.  (Yes, I understand that new versions of these apps will have to built…encouraging that is my job as evangelist.)
  • A set of common schemas, so that an email is an email is an email, no matter what app created it, and it always has a To:, From: and Subject: that I can access through the same API.
  • A data model that supports relationships, so that my event management app can specify “this email from Bill is related to the Longhorn Design Review event, as is this calendar appointment for next month”
  • A data model that supports extensions and meta-data on relationships, so that I not only say “this contact Jon is associated with this Design Review event”, but also “Jon is a speaker at this event” and “Jon is the author of this deck that he’ll present” and “Jon has not yet confirmed attendance at the party afterwards.”
  • Win32 file system access, so that even though files are stored in WinFS, applications can still get to their streams via a Win32 path

It seems Jeremy and I had the same thoughts about how WinFS could help in this regard. After thinking about the problem for a little bit, I realized that having all applications store similar information in a central repository brings a number of problems with it. The two main problems I can see are unreliable applications that cause data corruption and security. The example I'll use is imagine if RSS Bandit , SharpReader, and FeedDemon all stored their data in WinFS using the model that Jeremy describes above. This means all RSS/Atom feeds and configuration data used by each application are not stored in application specific folders as is done today but in a unified store of RSS items. Here are two real problems that need to be surmounted

  1. In the past bugs in RSS Bandit that led to crashes also caused corrupted feed files. It's one thing for bugs in an application to corrupt its data or configuration files but another for it to corrupt globally shared data. This is akin to the pain users and developers have felt when buggy apps corrupt the registry. The WinFS designers will have to account of this occurence in some way even if it is just coming up with application design guidelines.

  2. For feeds that require authentication, RSS Bandit stores the user name and password required to access the feed in configuration files. Having such data globally shared means that proper security precautions must be taken. Just because a user has entered a password in RSS Bandit doesn't mean they want it exposed to any other applications especially potentially malicious ones. Of course, this is no different from what can happen tpday in Windows (e.g. most modern viruses/worms search the file system for email address book files to locate new victims to email themselves to) but with the model being pushed by WinFS this becomes a lot easier. A way to share data between WinFS-aware applications in a secure manner is a must have.

None of these is an insurmountable problem but it does point out that things aren't as easy as one suspects at first glance at WinFS. I pass Mike Deem in the hallway almost daily, I should talk to him about this stuff more often. In the mean time I'll be sure to catch the various WinFS designers on the next episode of the .NET Show on MSDN.


 

Categories: Technology

February 25, 2004
@ 12:16 PM

In his post JDOM Hits Beta 10 Jason Hunter writes

According to my Palm Pilot calendar, we laid out the vision for JDOM on March 28th, 2000. I figure we'll ship before March 28, 2004. If we can ship 1.0 before it's been a full four years, I can just round down and call it three. :-)

What took it so long? Several things. I discovered XML is "fractally complex". At the highest level it appears only slightly complicated, but as you dig deeper you discover increasing complexity, and the deeper you go the more complicated it continues to become. Trying to be faithful to the XML standards while staying easy to use and intuitive was a definite challenge.

This is one of challenges I face in my day job designing XML APIs for the .NET Framework. The allure of XML and its related technologies is that they appear simple and straightforward but once one digs a little it turns out that everything isn't quite as easy as it seemed at first.

One of the drawbacks of this appearance of simplicity is that everyone thinks they can write an XML parser which leads to occurences such as what is described in this post by Shawn Farkas Creating a SecurityElement from XML  

The overhead of a full-fledged XML parser would be too much. Even if you accept the fact that we need a lightweight security XML object, we can't even provide utility methods on SecurityElement to convert back and forth System.Xml objects, since the CAS code lives in mscorlib.dll, and mscorlib cannot take a dependency on external DLL's. (Think of what would happen if mscorlib depended on System.Xml.dll, and System.Xml.dll depended on mscorlib ...). As if this weren't enough, there are at least 3 distinct XML parsers in v1.1 of the framework (System.Xml, SecurityElement, and a lightweight parser in mscoree.dll which handles parsing .config files ... this was actually optimized to be able to fit into no more than two pages of memory). Whidbey will be adding yet another parser to handle parsing ClickOnce manifests

One of the things I'm currently working on is coming up with guidelines that prevent occurences like System.Security.SecurityElement, a class that represents XML but does not interact well with the rest of the XML APIs in the .NET Framework, from happening again. This will be akin to Don Box's MSDN TV episode Passing XML Data in the CLR but will take the form of an Extreme XML article and a set of .NET Framework design guidelines.


 

Categories: XML

February 22, 2004
@ 09:08 PM

Yesterday my mom and I went on a train ride that is often billed as being a way for couples to spend a special occassion. The train was full of couples celebrating anniversaries, birthdays and other special occassions. Quite a number of couples were making out openly at the end of the train ride whose main features are a picturesque dinner on the train and a stop with a tour of a local winery.

One of the less romantic aspects of this train ride is that for the most part you have to share a table with another couple facing them. this means they get to overhear your conversation and interrupt yours. The couple we shared our table with were celebrating the guy's birthday and his girlfriend was treating him to a special day out that ended with the train ride. After we got back on the train from the winery tour the unexpected happened. They were engaged in conversation and he was comparing her favorably to ex-girlfriends, then all of a sudden he got down on one knee and pulled out a box with a ring in it. After a stunned silence she took it, said some words softly then said “I appreciate the sentiment but the timing is inappropriate” and handed it back. This was followed by her voicing her concerns about his ability to support them and him rattling of how much he made a month plus various bonuses, etc. I think it went downhill from there.

All through this I was staring out the window trying to make small talk with my mom but failing miserably. If and whenever I do end up proposing to someone I've definitely learned a thing or two about what not to do.


 

Categories: Ramblings

Jon Udell writes in is entry Heads, decks, and leads: revisited 

Yesterday, for example, Steve Gillmor told me that he's feeling overwhelmed by thousands of unread items in NetNewsWire. Yet I never feel that way. I suspect that's because I'm reading in batches of 100 (in the Radio UserLand feedreader). I scan each batch quickly. Although opinions differ as to whether or not a feed should be truncated, my stance (which I'm reversing today) has been that truncation is a useful way to achieve the effect you get when scanning the left column of the Wall Street Journal's front page. Of the 100 items, I'll typically only want to read several. I open them into new Mozilla tabs, then go back and read them. Everybody's different, but for me -- and given how newspapers work, I suspect for many others too -- it's useful to separate the acts of scanning and reading. When I'm done with the batch, I click once to delete all 100 items.

and in today's post entitled Different strokes he writes

I agree. In trying to illustrate a point about scanning versus reading, I'm afraid I fanned the flames of the newsreader-style versus browser-style debate. In fact, the two modes can be complementary. I just bought the full version of NetNewsWire, which exploits that synergy as Brent describes. So does FeedDemon, which this posting prompted me to re-explore.

This highlights a conflict between the traditional 3-pane aggregators that follow the mail or news reader model which implies that every post is important and should be read one by one and web-style aggregators like Radio Userland that present blogs in a unified web-based view reminiscent of an aggregated blog or newspaper. On the RSS Bandit wiki there's a wishlist item that reads

Newspaper view. A summery of unread feed items, formatted by a XSLT stylesheet and displayed as HTML/PDF. Inspired by Don Park. also here

which was originally added by Torsten. He never got around to adding this feature because he felt it wasn't that useful after all. I never implemented it because one would have to provide a way to interact with posts from this newspaper view (i.e. mark them as read or deleted, view comments, etc) which either translates to Javascript coding or running a local web server. Neither of the options was palatable.

This morning I downloaded FeedDemon to see how it got around these problems for its newspaper view. I found out that it does the obvious thing, it doesn't. From what I gather there is an option to 'mark all items in a channel as read' once you leave the channel. So once you close the newspaper view it assumes every post that showed up in it was read. A heavy-handed approach but it probably works for the most part.  

Looks like something else to add to the RSS Bandit TODO list.

I've been thinking that something like this is necessary after reading Robert Scoble's post 1296 newsfeeds +are+ sustainable where he wrote

Here's my workflow:

At about 5 p.m. every day I tell NewsGator to get me my feeds. It is downloading them in the background as I speak.

Then I open each folder that's bold...

Then I only read the headlines. I'm getting very good at ignoring headlines with subjects like "isn't my cat cute?" See, that's another productivity point. Robin probably assumes I read all the crap that people post. I don't. I only read those things that MIGHT be interesting. If I find a headline that's interesting, then I scan the article it is associated with. I don't read it. Just scan at that point. Usually that means reading the first paragraph and scanning the rest for later.

I've found that reading headlines isn't always the best way to find good stuff and wouldn't mind a way to quickly scan all the articles in a category that goes beyond eyeballing a bunch of headlines. However I'm going to avoid Jon Udell's advice about XHTML-izing all the HTML content in feeds is the way to get you there. Been there, done that, not going back. The approach used by FeedDemon is a step in the right direction and doesn't require absorbing the problems that  comes with trying to convert the ill-formed markup that typically shows up in feeds to XHTML.


 

Categories: RSS Bandit | XML