Dare Obasanjo's weblog

XmlReader and the Factory Design Pattern

Daniel Cazzulino writes in response to Don Demsak's post on Waking Up From A DOM Induced Coma

So, in this regard, I believe SUN is doing a good job at concentrating on pluggable and standard interfaces and specifications, and letting whoever wants to take the time to implement custom stuff.
I don't want to "new XmlTextReader". I want some app/system-wide factory take care of creating the appropriate parser implementation for me based on declarative configuration, and I want my to code to work against a single unified interface/base class always.
Changing the parser shouldn't mean I have to change my working app code. If MS provides the appropriate abstractions, it wouldn't even be necessary to rely on some implementation-specific feature such as XmlTextReader.GetRemainder that is not part of the abstract contract defined by XmlReader.

I both agree and disagree with Daniel. We do have a single unified interface for processing XML which developers can program against, it is called the XmlReader. Unfortunately, we subclassed this class into the XmlTextReader and XmlValidatingReader which are actually what most developers program against including our devs internally. In the next version of the .NET Framework we are moving away from the XmlTextReader and XmlValidating reader. Instead we will emphasize programming directly to the XmlReader and will provide an implementation of the factory design patterns which returns different XmlReader instances based on which features the user is interested. More importantly users will be able to layer different XmlReader implementations on those created by our factory which was always our intention since v1.0 of the .NET Framework. For example, one could layer XSD Validation on top the XIncludingReader from XInclude.NET to combine third party XInclude support with Microsoft's W3C XML Schema validation technologies.

As for whether the Sun's approach of just providing interfaces instead of concrete for XML parsing was such a great thing in Java I'd claim that it's been hit and miss. Most XML developers from the Java world despise the DOM for the reasons described in Chapter 33 of Elliotte Rusty Harold's Effective XML. This is the reason for the existence of extensions and alternatives to the DOM API which extend it such as Oracle's XDK, dom4J, JDOM, Xerces and XOM. Heck, you can't even get the XML as a string out of node or save an XML document object to a file without using extensions since these aren't in the base DOM API. As for SAX, the API just gives you access to regular parsing events nothing fancy. There isn't much difference functionally from programming against the base SAX APIs and programming against XmlReader.

The one point of interest is that Daniel claims that the Java way of not shipping with any XML APIs but just interfaces is somehow better than the .NET way. In Java one can programa against interfaces and loads the XML parser by passing the class name to a factory method. One could put this name in a config file and change it at runtime. The question is whether anyone in the .NET world actually thinks being able to change your XML parser implementation at runtime is anything more than a geek feature. I consider it as geeky as asking why you can't change the implementation of the System.String class to a user defined class that uses less memory at runtime without having to recompile. An interesting idea but one primarily of interest to the ultimate of power users.

The funny thing is that even if we shipped functionality where we looked in the registry or in some config file before figuring out what XML parser to load it's not as if there are an abundance of third party XML parsers targetting the .NET Framework in the first place. There is definitely no intention to ship any functionality like this in future versions of the .NET Framework.

Categories: XML

February 28, 2004

@ 05:20 AM

Bloggers at Microsoft

Dylan Greene was at Microsoft last week and talks about some observations about blogging and Microsoft in his post My meeting with the Scoblizer

Some interesting things I picked up while at Microsoft:

None of my friends there blog.
None of them had heard of Scoble. (!)
None of them use RSS readers or read blogs with any frequency.
None of them seemed to understand the draw of blogging.

There are about 300 people blogging at Microsoft which sounds like a lot until you realize that at last count Microsoft had 55,000 employees. That means less than 1% of the employees at Microsoft are blogging. When you consider that it isn't that surprising that none of his friends blog when less than one in every hundred Microsoft employees blog or that they didn't know some random evangelist on the Windows team by name.

That said I do agree with Cameron Reilly that Microsoft is “still way ahead of the curve in terms of corporate blogging”.

Categories: Life in the B0rg Cube

February 27, 2004

@ 11:36 PM

Comments [13]

The Gay Marriage Debate

I've been watching the online discussions about the proposed constitutional ammendment to ban gay marriage with bemusement. It is such a classic sleight of hand trick. If I was a sitting president who'd been discovered to have started a war that cost thousands of lives primarily to enrich my defence contractor buddies and had the opposition party's presidential candidates polling better than me I'd want to come up with a way to focus the public discourse away from these issues. Perhaps with controversial proposed legislation that would be a hot button topic but most likely wouldn't get passed anyway? Yeah, probably.

It is unfortunate that such political games end up affecting people's lives and preventing the pursuit of happiness. At least it's not another phony war.

Categories: Ramblings

February 27, 2004

@ 08:20 PM

RSS Bandit & ATOM

Given the fact that about 15 news aggregators currently support Mark Nottingham's Atom Syndication Format 0.3 (PRE-DRAFT) I'll be adding support for it to RSS Bandit this weekend. This won't be a big deal to implement relative to a number of other features Torsten and I have in mind. As Brent Simmons wrote

This experience was a reminder for me of how unimportant the underlying syndication formats are, in a way. What percent of time does an aggregator developer spend on RSS and Atom parsing code? 50%? 25%? 10%?

I figure it’s somewhere less than 1%.

The rest of the time is taken up with things like data storage, networking, and user interface. But mostly user interface. Not just implementing—which is often easy—but designing user interface, which is difficult.

In other RSS Bandit news Torsten is almost done with some code that fixes our #2 performance problem in RSS Bandit and Phil Haack has started work on official RSS Bandit documentation. Excellent work.

All of the above should show up in the next RSS Bandit release. Phil's documentation will most likely reside on the RSS Bandit Documentation Page on SourceForge and will be linked to from the RSS Bandit help menu.

Categories: RSS Bandit

February 26, 2004

@ 05:54 PM

Aaron Swartz on Anti-Truths

Aaron Swartz has lots of interesting ideas about politics and copyright in the age of digital media. I disagree with a lot of his ideas on both but they are often well-thought and interesting. This month he continues his trend of interesting posts about politics with two entries Up is Down: How Stating the False Hides the True excerpted below

One of the more interesting Republican strategies is saying things whose opposite is true. They say that the Democratic nominee is bought off by special interests, the Democrats are outspending them, the Democrats are playing dirty, the Democrats don’t care about homeland security, the Democrats hate America, all when this is far more true of the Republicans. They say Joseph McCarthy was a noble man, the media has a liberal bias, affirmative action is bad for equality, Saddam had weapons of mass destruction, and Ronald Reagan was our greatest President, all when the opposite is far more true.

At first glance this seems bizarre — why draw attention to your weaknesses? But it’s actually a very clever use of the media. The media tries hard to be “fair and balanced”, and it generally believes the best way to do this is to present the opinions from both sides and make as few judgement calls as possible (to avoid introducing their own bias). And if there’s a debate on some issue, taking a side is seen as a judgement call.

and Down is Up: What This Stuff Is where he writes

I got a lot of responses to my previous post, Up is Down, along the lines of “oh, the Democrats lie as much as the Republicans”. But the piece was not about lies. For lack of a better term, it was about anti-truths. Anti-truths have two parts:

They’re completely false.
They’re more accurate when directly reversed.

It’s hard to find a completely unobjectionable one, but take “Ronald Reagan was our greatest President.” As for part one, I have seen no evidence that Reagan actually did anything particularly good on purpose and as for two, “Ronald Reagan was our worst President” seems to be a far more accurate statement, since he did lots of things that were quite bad.

My example of an anti-truth would have been “John Ashcroft respects the US constitution”. :)

Categories: Mindless Link Propagation

February 26, 2004

@ 05:45 PM

WinFS as a Unified Data Store

In his post WinFS Scenario #2: event planning Jeremy Mazner writes

So as you can see, the information for any given event is spread all over the place, which means I can’t really keep track of it all. If I want to see the invite list and RSVP status, I’m either in Outlook (if I’m lucky) or off to some file share to find the spreadsheet. If I want to see some information about an invitee (their phone number, or who their account manager is), it’s off to the directory or CRM system to look it up. If I want to know what presentations have been approved for use, I crawl through email to find the one message from my general manager where he says the presentation is ready. If I want to see the actual presentation, it’s back to a file share or Sharepoint

What I really want is a way to corral all these related items together, stick them in a big bucket with a label that says what event they’re for. I want a simple UI where I can see what events are coming up, then browse through all the related material for each one, and maybe be able to answer some simple questions: how many presentations for this event are in the Approved state? How many attendees have declined the invitation?

I’ll assert that this is really, really hard to do today. Outlook wizards would probably argue that you could do this with some series of catagories, public folders, and shared calendars. SharePoint gurus would say this is exactly what a Meeting Workspace is for. Old school event planners might claim you could track this all in a nice big spreadsheet with multiple pages and links out to file locations.

This is a valid problem that Jeremy brings up in his scenario and one I've thought about in the past when it comes to tying information from disparate applications about the same person. Outlook comes closes to doing what I'd like to see here but it gets help from Exchange. I was curious as to how Jeremy thought WinFS could help here and his thoughts were similar to what I'd first thought when I heard about WinFS, specifically he wrote

What does WinFS provide that will help?

A common storage engine, one unified namespace for storage of any application data on your machine. Whether I use Outlook, AOL Communicator, or Notes, my emails can all be stored in WinFS. (Yes, I understand that new versions of these apps will have to built…encouraging that is my job as evangelist.)
A set of common schemas, so that an email is an email is an email, no matter what app created it, and it always has a To:, From: and Subject: that I can access through the same API.
A data model that supports relationships, so that my event management app can specify “this email from Bill is related to the Longhorn Design Review event, as is this calendar appointment for next month”
A data model that supports extensions and meta-data on relationships, so that I not only say “this contact Jon is associated with this Design Review event”, but also “Jon is a speaker at this event” and “Jon is the author of this deck that he’ll present” and “Jon has not yet confirmed attendance at the party afterwards.”
Win32 file system access, so that even though files are stored in WinFS, applications can still get to their streams via a Win32 path

It seems Jeremy and I had the same thoughts about how WinFS could help in this regard. After thinking about the problem for a little bit, I realized that having all applications store similar information in a central repository brings a number of problems with it. The two main problems I can see are unreliable applications that cause data corruption and security. The example I'll use is imagine if RSS Bandit , SharpReader, and FeedDemon all stored their data in WinFS using the model that Jeremy describes above. This means all RSS/Atom feeds and configuration data used by each application are not stored in application specific folders as is done today but in a unified store of RSS items. Here are two real problems that need to be surmounted

In the past bugs in RSS Bandit that led to crashes also caused corrupted feed files. It's one thing for bugs in an application to corrupt its data or configuration files but another for it to corrupt globally shared data. This is akin to the pain users and developers have felt when buggy apps corrupt the registry. The WinFS designers will have to account of this occurence in some way even if it is just coming up with application design guidelines.
For feeds that require authentication, RSS Bandit stores the user name and password required to access the feed in configuration files. Having such data globally shared means that proper security precautions must be taken. Just because a user has entered a password in RSS Bandit doesn't mean they want it exposed to any other applications especially potentially malicious ones. Of course, this is no different from what can happen tpday in Windows (e.g. most modern viruses/worms search the file system for email address book files to locate new victims to email themselves to) but with the model being pushed by WinFS this becomes a lot easier. A way to share data between WinFS-aware applications in a secure manner is a must have.

None of these is an insurmountable problem but it does point out that things aren't as easy as one suspects at first glance at WinFS. I pass Mike Deem in the hallway almost daily, I should talk to him about this stuff more often. In the mean time I'll be sure to catch the various WinFS designers on the next episode of the .NET Show on MSDN.

Categories: Technology

February 25, 2004

@ 12:16 PM

Talk About Akward Moments

Still Waters Run Deep

In his post JDOM Hits Beta 10 Jason Hunter writes

According to my Palm Pilot calendar, we laid out the vision for JDOM on March 28th, 2000. I figure we'll ship before March 28, 2004. If we can ship 1.0 before it's been a full four years, I can just round down and call it three.

What took it so long? Several things. I discovered XML is "fractally complex". At the highest level it appears only slightly complicated, but as you dig deeper you discover increasing complexity, and the deeper you go the more complicated it continues to become. Trying to be faithful to the XML standards while staying easy to use and intuitive was a definite challenge.

This is one of challenges I face in my day job designing XML APIs for the .NET Framework. The allure of XML and its related technologies is that they appear simple and straightforward but once one digs a little it turns out that everything isn't quite as easy as it seemed at first.

One of the drawbacks of this appearance of simplicity is that everyone thinks they can write an XML parser which leads to occurences such as what is described in this post by Shawn Farkas Creating a SecurityElement from XML

The overhead of a full-fledged XML parser would be too much. Even if you accept the fact that we need a lightweight security XML object, we can't even provide utility methods on SecurityElement to convert back and forth System.Xml objects, since the CAS code lives in mscorlib.dll, and mscorlib cannot take a dependency on external DLL's. (Think of what would happen if mscorlib depended on System.Xml.dll, and System.Xml.dll depended on mscorlib ...). As if this weren't enough, there are at least 3 distinct XML parsers in v1.1 of the framework (System.Xml, SecurityElement, and a lightweight parser in mscoree.dll which handles parsing .config files ... this was actually optimized to be able to fit into no more than two pages of memory). Whidbey will be adding yet another parser to handle parsing ClickOnce manifests

One of the things I'm currently working on is coming up with guidelines that prevent occurences like System.Security.SecurityElement, a class that represents XML but does not interact well with the rest of the XML APIs in the .NET Framework, from happening again. This will be akin to Don Box's MSDN TV episode Passing XML Data in the CLR but will take the form of an Extreme XML article and a set of .NET Framework design guidelines.

Categories: XML

February 22, 2004

@ 09:08 PM

Comments [7]

Yesterday my mom and I went on a train ride that is often billed as being a way for couples to spend a special occassion. The train was full of couples celebrating anniversaries, birthdays and other special occassions. Quite a number of couples were making out openly at the end of the train ride whose main features are a picturesque dinner on the train and a stop with a tour of a local winery.

One of the less romantic aspects of this train ride is that for the most part you have to share a table with another couple facing them. this means they get to overhear your conversation and interrupt yours. The couple we shared our table with were celebrating the guy's birthday and his girlfriend was treating him to a special day out that ended with the train ride. After we got back on the train from the winery tour the unexpected happened. They were engaged in conversation and he was comparing her favorably to ex-girlfriends, then all of a sudden he got down on one knee and pulled out a box with a ring in it. After a stunned silence she took it, said some words softly then said “I appreciate the sentiment but the timing is inappropriate” and handed it back. This was followed by her voicing her concerns about his ability to support them and him rattling of how much he made a month plus various bonuses, etc. I think it went downhill from there.

All through this I was staring out the window trying to make small talk with my mom but failing miserably. If and whenever I do end up proposing to someone I've definitely learned a thing or two about what not to do.

Categories: Ramblings

February 22, 2004

@ 06:32 PM

Newspaper Views for Reading RSS Feeds

Jon Udell writes in is entry Heads, decks, and leads: revisited

Yesterday, for example, Steve Gillmor told me that he's feeling overwhelmed by thousands of unread items in NetNewsWire. Yet I never feel that way. I suspect that's because I'm reading in batches of 100 (in the Radio UserLand feedreader). I scan each batch quickly. Although opinions differ as to whether or not a feed should be truncated, my stance (which I'm reversing today) has been that truncation is a useful way to achieve the effect you get when scanning the left column of the Wall Street Journal's front page. Of the 100 items, I'll typically only want to read several. I open them into new Mozilla tabs, then go back and read them. Everybody's different, but for me -- and given how newspapers work, I suspect for many others too -- it's useful to separate the acts of scanning and reading. When I'm done with the batch, I click once to delete all 100 items.

and in today's post entitled Different strokes he writes

I agree. In trying to illustrate a point about scanning versus reading, I'm afraid I fanned the flames of the newsreader-style versus browser-style debate. In fact, the two modes can be complementary. I just bought the full version of NetNewsWire, which exploits that synergy as Brent describes. So does FeedDemon, which this posting prompted me to re-explore.

This highlights a conflict between the traditional 3-pane aggregators that follow the mail or news reader model which implies that every post is important and should be read one by one and web-style aggregators like Radio Userland that present blogs in a unified web-based view reminiscent of an aggregated blog or newspaper. On the RSS Bandit wiki there's a wishlist item that reads

Newspaper view. A summery of unread feed items, formatted by a XSLT stylesheet and displayed as HTML/PDF. Inspired by Don Park. also here

which was originally added by Torsten. He never got around to adding this feature because he felt it wasn't that useful after all. I never implemented it because one would have to provide a way to interact with posts from this newspaper view (i.e. mark them as read or deleted, view comments, etc) which either translates to Javascript coding or running a local web server. Neither of the options was palatable.

This morning I downloaded FeedDemon to see how it got around these problems for its newspaper view. I found out that it does the obvious thing, it doesn't. From what I gather there is an option to 'mark all items in a channel as read' once you leave the channel. So once you close the newspaper view it assumes every post that showed up in it was read. A heavy-handed approach but it probably works for the most part.

Looks like something else to add to the RSS Bandit TODO list.

I've been thinking that something like this is necessary after reading Robert Scoble's post 1296 newsfeeds +are+ sustainable where he wrote

Here's my workflow:

At about 5 p.m. every day I tell NewsGator to get me my feeds. It is downloading them in the background as I speak.

Then I open each folder that's bold...

Then I only read the headlines. I'm getting very good at ignoring headlines with subjects like "isn't my cat cute?" See, that's another productivity point. Robin probably assumes I read all the crap that people post. I don't. I only read those things that MIGHT be interesting. If I find a headline that's interesting, then I scan the article it is associated with. I don't read it. Just scan at that point. Usually that means reading the first paragraph and scanning the rest for later.

I've found that reading headlines isn't always the best way to find good stuff and wouldn't mind a way to quickly scan all the articles in a category that goes beyond eyeballing a bunch of headlines. However I'm going to avoid Jon Udell's advice about XHTML-izing all the HTML content in feeds is the way to get you there. Been there, done that, not going back. The approach used by FeedDemon is a step in the right direction and doesn't require absorbing the problems that comes with trying to convert the ill-formed markup that typically shows up in feeds to XHTML.

Categories: RSS Bandit | XML

February 21, 2004

@ 09:15 PM

Microsoft XML Web Services Architect Joins the Atom Effort

In his post entitled Back in the Saddle Don Box writes

My main takeaway was that it's time to get on board with Atom - Sam is a master cat herder and I for one am ready to join the other kittens.

This is good news. Anyone who's read my blog probably can discern that I think the ATOM syndication format is a poorly conceived, waste of effort that unnecessarily fragments the website syndication world. On the other hand the ATOM API especially the bits about SOAP enabled clients are a welcome upgrade to the existing landscape of blog posting/editing APIs.

My experiences considerng how to implement the ATOM API in RSS Bandit have highlighted a one or two places where the API seems 'problematic' which actually point more to holes in the XML Web Services architecture than actual problems with the API. The two scenarios that come most readily to mind are

Currently if a user would wants to post a comment to their blog using client software they need to configure all sorts of technical settings such as which API to use, port numbers, end point URLs and a lot more. For example, look at what one has to post to a dasBlog weblog from w.bloggar Ideally, the end user should just be able to point their client at their blog URL (e.g. http://www.25hoursaday.com/weblog) and it figures out the rest.
The current ATOM specs describe a technique for discovering the web service end points a blog exposes which involves downloading the HTML page and parsing out all the <link> tags. I've disagreed with this approach in the past but the fact is that it does get the job done.

What this situation has pointed out to me is that there is no generic way to go up to a website and find out what XML Web Service end points it exposes. For example, if you wanted to tell all the publiclly available Web Services provided by Microsoft you'd have to read Aaron Skonnard's A Survey of Publicly Available Web Services at Microsoft instead of somehow discovering this programmatically. Maybe this is what UDDI was designed for?
Different blogs allow different syntax for posting comments. I've lost count of the amount of times I've posted a comment to a blog and wanted to provide a link but couldn't tell whether to just use a naked URL (http://www.example.com) or a hyperlink (<a href=“http://www.example.com“>example link</a>). Being that RSS Bandit has supported the CommentAPI for a while now I've constantly been frustrated by the inability to tell what kind of markup or markup subset the blog allows in comments. A couple of blogs provide formatting rules when one is posting a comment but there really is no programmatic way to discover this.

Another class of capabilities I'd like to discover dynamically is which features a blog supports. For instance, the ATOM API spec used to have a 'Search facet' which was removed because it seemed to many people thought it'd be onerous to implement. What I'd have preferred would have been for it to be optional then clients could dynamically discover whether the ATOM end point had search capabilities and if so how rich they were.

The limitation here is that there isn't a generic way to discover and enunciate the fine grained capabilities of an XML Web Service end point. At least not one I am familiar with.

It would be nice to see what someone like Don Box can bring to the table in showing how to architect and implement such a loosely coupled XML Web Service based system on the World Wide Web.

Categories: Technology | XML

February 21, 2004

@ 01:23 PM

Comments [9]

RSS Bandit Future Release Plans

Torsten and I have fixed about a dozen bugs since the v1.2.0.90 release and implemented one or two minor features. There are two major issues we'd like to tackle in the next few weeks then ship a minor release then work on the next major version of RSS Bandit. The issues are both performance related

High Memory Consumption: We don't consume as much memory as the other free .NET Framework based aggregator, SharpReader, which I've seen consume over 100MB of memory but we do stay in the 30MB to 60MB range which is excessive. I'm pretty sure I have a good idea what the prime culprits are for the memory issues but currently can't think of a good way to reduce the memory consumption without removing or degrading some features. However our perf goals are to reduce those numbers by half in the next few weeks.
Feeds with Lots of Posts Take too Long to Load: This is related to one of the culprits in the previous problem. If you are subscribed to a feed such as Weblogs @ ASP.NET which gets 50-100 posts a day which translates to about 1500-3000 posts a month then there is a perceptible slowdown in how long it takes to load the listview when you click on the feed.

We'll be fixing bugs and implementing minor features along the way while getting the aforementioned performance issues under control. Once we are satisfied with the perf , we'll have a beta period and then ship a release. This should be within the next month.

After this there are a number of significant features we have slated such as NNTP support, subscription harmonization using SIAM, translations to multiple languages (German being the first) and better integration with IE (such as supporting the Favorites menu). The release where these features show up will be in 2 or 3 months.

In the meantime, Torsten and I will be discussing RSS Bandit development in our blogs and on the RSS Bandit mailing list

Categories: RSS Bandit

February 21, 2004

@ 12:39 PM

It's the Superficial Features that Make Me Smile

Torsten figured out how to place background images in our XSLT styled themes for RSS Bandit this past week. As a celebration, we'll be including the following theme in the default install.

Personally, I'd have preferred Torsten's Christina Aguilera theme. :)

Categories: RSS Bandit

February 20, 2004

@ 05:22 PM

The Impedence Mismatch between W3C XML Schema and the CLR

Daniel Cazzulino is writing about W3C XML Schema type system < - > CLR type system and has an informal poll at the bottom of his article where he writes

We all agree that many concepts in WXS don't map to anything existing in OO languages, such as derivation by restriction, content-ordering (i.e. sequence vs choice), etc. However, in the light of the tools the .NET Framework makes available to map XML to objects, we usually have to analyze WXS (used to define the structure of that very XML instance to be mapped) and its relation with our classes
In this light, I'm conducting a survey about developer's view on the relation of the XSD type system and the .NET one. Ignoring some of the more advanced (I could add cumbersome and confusing) features of WXS, would you say that both type systems fit nicely with each other?

I find the question at the end of his post which I highlighted to be highly tautological. His question is basically, “If you ignore the parts where they don't fit well together do the CLR and XSD type system fit well together?”. Well if you ignore the parts where they don't then the only answer is YES. In reality many developers don't have the freedom to ignore parts of XSD they don't want to support especially when utilizing XML Web Services designed by others.

There are two primary ways one can utilize the XmlSerializer which maps between XSD and CLR types

XML Serialization of Object State: In this case the developer is only interested in ensuring that the state of his classes can be converted to XML. This is a fairly simple problem because the expressiveness of the CLR is a subset of that of W3C XML Schema. Any object's state could be mapped to an element of complex type containing a sequence or choice of other nested elements that are either nested simple types or complex types.

Even then there are limitations in the XmlSerializer which make this cumbersome such as the fact that it only serializes public fields but not public properties. But that is just a design decision that can be revisited in future releases.
Conversion of XML to Objects: This is the scenario where a developer converts an XML schema to CLR objects to make them easier to program against. This is particularly common in XML Web Services scenarios which is why the XmlSerializer was originally designed. In this scenario the conversion tool has to contend with the breadth of features in the XML Schema: Structures and XML Schema: Datatypes recommendations.

There are enough discrepancies between the W3C XML Schema type system and that of the CLR to fill a Ph.D thesis. I touched on some of these in my article XML Serialization in the .NET Framework such as
Q: What aspects of W3C XML Schema are not supported by the XmlSerializer during conversion of schemas to classes?

A: The XmlSerializer does not support the following:
- Any of the simple type restriction facets besides enumeration.
- Namespace based wildcards.
- Identity constraints.
- Substitution groups.
- Blocked elements or types.
After gaining more experience with working with the XmlSerializer and talking to a number of customers I wrote som more about the impedance mismatches in my article XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? specifically
For usage scenarios where a schema is used to create strongly typed XML, derivation by restriction is problematic. The ability to restrict optional elements and attributes does not exist in the relational model or in traditional concepts of type derivation from OOP languages. The example from the previous section where the email element is optional in the base type, but cannot appear in the derived type, is incompatible with the notion of derivation in an object oriented sense, while also being similarly hard to model using tables in a relational database.

Similarly changing the nillability of a type through derivation is not a capability that maps to relation or OOP models. On the other hand, the example that doesn't use derivation by restriction can more straightforwardly be modeled as classes in an OOP language or as relational tables. This is important given that it reduces the impedance mismatch which occurs when attempting to map the contents of an XML document into a relational database or convert an XML document into an instance of an OOP class

I'm not the only one at Microsoft who's written about this impedance mismatch or tried to solve it. Gavin Bierman, Wolfram Schulte and Erik Meijer wrote in their paper Programming with Circles, Triangles and Rectangles an entire section about this mismatch. Below are links to descriptions of a couple of the mismatches they found most interesting
The mismatch between XML and object data-models
     Edge-labelled vs. Node-labelled
     Attributes versus elements
     Elements versus complex and simple types
     Multiple occurrences of the same child element
     Anonymous types
     Substitution groups vs derivation and the closed world assumption
     Namespaces, namespaces as values
     Occurence constraints part of container instead of type
     Mixed content

There is a lot of discussion one could have about the impedance mismatch between the CLR type system and the XSD type system but one thing you can't say is that it doesn't exist or that it can be ignored if building schema-centric applications.

In conclusion, the brief summary is that if one is mapping objects to XML for the purpose of serializing their state then there is a good match between the CLR & XSD type systems since the XSD type system is more expressive than the CLR type system. On the other hand, if one is trying to go from XSD to the CLR type system there are significant impedance mismatches some of which are limitations of the current tools (e.g. XmlSerializer could code gen range checks for derivation by restriction of simple types or uniqueness tests for identity constraints ) while others are fundamental differences between the XSD type system and object oriented programming such as the difference between derivation by restriction in XSD and type derivation.

Categories: XML

February 20, 2004

@ 07:07 AM

Comments [10]

Never ascribe to malice that which can be explained by incompetence

Recently ZDNet ran an article entitled Google spurns RSS for rising blog format where it stated

The search giant, which acquired Blogger.com last year, began allowing the service's million-plus members to syndicate their online diaries to other Web sites last month. To implement the feature, it chose the new Atom format instead of the widely used, older RSS.

I've seen some discussion about the fact that Google only provides feeds for certain blogs in the ATOM 0.3 syndication format which is an interim draft of the spec that is part of an effort being driven by Sam Ruby to replace RSS and related technologies. When I first read this I ignored it because I didn't have any Blogger.com feeds that were of interest to me. This changed today. This afternoon I found out that Steve Saxon, the author of the excellent article XPath Querying Over Objects with ObjectXPathNavigator had a Blogger.com blog that only provided an ATOM feed. Being that I use RSS Bandit as my aggregator of choice I cannot subscribe to his feed nor can I use a large percentage of the existing news aggregators to read Steve's feed.

What a find particularly stunning about Google's decision is that they have removed support for an existing, widely supported format for an interim draft of a format which according to Sam Ruby's slides for the O'Reilly Emerging Technology Conference is several months away from being completed. An appropriate analogy for what Google has done would be like AOL abandoning support for HTML and changing all of its websites to use the May 6th 2003 draft of the XHTML 2.0 spec. It simply makes no sense.

Some people, such as Dave Winer believe Google is engaging in such user unfriendly behavior for malicious reasons but given that Google doesn't currently ship a news aggregator there doesn't seem to be much of a motive there (Of course, this changes once they ship one). I recently stumbled across an article entitled The Basic Laws of Human Stupidity which described the following 5 laws

Always and inevitably everyone underestimates the number of stupid individuals in circulation.
The probability that a certain person be stupid is independent of any other characteristic of that person.
A stupid person is a person who causes losses to another person or to a group of persons while himself deriving no gain and even possibly incurring losses.
Non-stupid people always underestimate the damaging power of stupid individuals. In particular non-stupid people constantly forget that at all times and places and under any circumstances to deal and/or associate with stupid people always turns out to be a costly mistake.
A stupid person is the most dangerous type of person.

The only question now is Is Google crazy, or crazy like a fox? and only time will tell the answer to that question.

Categories: Technology

February 18, 2004

@ 06:26 PM

Mr. Safe's Guide to the RSS vs. ATOM debate

Dave Winer recently wrote that at least one person has asked if it is safe to ignore Atom in his weblog. If you are a cautious person like Tim Bray's Mr. Safe or you fit more on the right than the left side of the Technology Adoption Life Cycle then you are probably wondering why you should want to support the Atom syndication format over one of the many flavors of RSS. There are two parts to this question, if you are a consumer of syndication feeds or if you are a consumer of syndication feeds.

The Safe Syndication Producer's Perspective
An RSS feed is a regularly updated XML document that contains metadata about a news source and the content in it. Minimally an RSS feed consists of a channel that represents the news source, which has a title, link, and description that describe the news source. Additionally, an RSS feed typically contains one or more item elements that represent individual news items, each of which should have a title, link, or description

There are two primary flavors of RSS; Dave Winer's family of specifications (the most popular being RSS 0.91 & RSS 2.0) and the RDF-based RSS 1.0. The most popular are Dave Winer's family of specifications which have been adopted by a number of well-known organizations such as Yahoo! News, the BBC, Rolling Stone magazine, the Microsoft Developer Network (MSDN) , the Oracle Technology Network (OTN), the Sun Developer Network and Apple's iTunes Music Store. According to Syndic8 which tracks over 50,000 RSS feeds RSS 0.91, RSS 1.0 & RSS 2.0 all have about 30% of the RSS marketshare.

Most news aggregators support all 3 major versions of RSS although few actually take advantage of the fact that RSS 1.0 is an RDF vocabulary. If all one want is simple syndication of news items the RSS 0.91 should be satisfactory. If one plans to use extensions to the core RSS specification that expose application or domain specific functionality such as the ability to post comments one can use one of the many RSS modules in combination with RSS 2.0. The only advantage that RSS 1.0 gives over RSS 0.91/RSS 2.0 is that it is an RDF vocabulary and thus fits nicely into the dream of the Semantic Web.

The Atom syndication format can be considered to be a more sophisticated implementation of the ideas in RSS 2.0. It adds richer syndication capabilities such as the ability to put binary formats such as Word documents and Powerpoint documents in feeds and formalizes some of the best practices in the RSS world around putting [X]HTML in feeds.

The average user of a news aggregator will not be able to tell the difference between an Atom or RSS feed from their aggregator if it supports both. However users of aggregators that don't support Atom will not be able to subscribe to feeds in that format. In a few years, the differences between RSS and Atom will most likely be the same as those that are different between RSS 1.0 and RSS 0.91/RSS 2.0; only of interest to a handful of XML syndication geeks. Even then the simplest and safest bet would still be to use RSS as a syndication format. This is the same as the fact that even though the W3C has published XHTML 1.0 & XHTML 1.1 and is working on XHTML 2.0, the safest bet to get the widest reach with the least problems is to publish a website in HTML 3.2 or HTML 4.01.

The Safe Syndication Consumer's Perspective
If you plan to consume feeds from a wide variety of sources then one should endeavor to support as many syndication formats as possible. The more formats a feed consumer supports the more content is available for its users.

Based on their current popularity, degree of support and ease of implementation one should consider supporting the major syndication formats in the following order of priority

RSS 0.91/RSS 2.0
RSS 1.0
Atom

RSS 0.91 support is the simplest to implement and most widely supported by websites while Atom is the most difficult to implement being the most complex and will be least supported by websites in the coming years.

Categories: XML

February 18, 2004

@ 04:42 PM

Introductions: My Day Job

This is here mainly for me to be able to look back on in a few years and for any new readers of my blog who wonder what I actually do at Microsoft.

I am a program manager for the WebData XML team. The WebData team is part of the SQL Server Product Unit and produces the major data access technologies that Microsoft produces including MDAC, MSXML, ADO.NET, System.Xml, ObjectSpaces and the WinFS API.

As a technical program manager I am responsible for the nitty gritty of the design of the classes in the following namespaces in the .NET Framework

System.Xml.Schema
System.Xml.XPath
System.Xml (I was sharing this with Joshua before he left our team)

Nitty gritty design details means stuff like triaging bug fixes, designing new features or new classes, writing specifications, and interacting with internal & external customers to discover their likes and dislikes about the APIs in question.

I am also the community lead for the WebData XML team which means I am responsible for things like XML Most Valuable Professional (MVP) program and the upcoming MSDN XML Developer Center. For the MVP program I am the primary point of contact between my team and the Microsoft MVP program and with our MVPs. I am also one of the folks who approves or rejects nominees. As for the developer center, I am the equivalent of what MSDN likes to call a “content strategist“ which basically means I am responsible for the content on the site. For the most part I am also the primary point of contact between my team and MSDN.

If you have any issues or questions related to the aforementioned aspects of my job at Microsoft (e.g. bug reports, feature requests or questions about writing for MSDN) feel free to ping me on my work email address. If you don't know it you should be able to find it from a minute or two of Googling.

Categories: Life in the B0rg Cube

February 18, 2004

@ 06:28 AM

WinFS as a Digital Media Store

Chris Sells writes

On his quest to find "non-bad WinFS scenarios" (ironically, because he was called out by another Microsoft employee -- I love it when we fight in public : ), Jeremy Mazner, Longhorn Technical Evangelist, starts with his real life use of Windows Movie Maker and trying to find music to use as a soundtrack. Let Jeremy know what you think.

I think the scenario is compelling. In fact, the only issue I have with the WinFS scenario that Jeremy outlines is that he implies that the metadata about music files Windows Media player exposes is tied to the application but in truth most of it is tied to the actual media files as regular file info [file location, date modified, etc] or as ID3 tags [album, genre, artist, etc]. This means that there doesn't even need to be explicit inter-application sharing of data.

If the file system had a notion of a music item which exposed the kind of information one sees in ID3 tags which was also exposed by the shell in standard ways then you could do lots of interesting things with music metadata without even trying hard. I also like it's quite compelling because metadata attached to music files is such a low hanging fruit that one can get immediate value out of and which exists today on the average person's machine.

Categories: Technology

February 17, 2004

@ 06:34 PM

Slashdot Puts A Human Face On Outsourcing

The folks at Slashdot have Indian Techies Answer About 'Onshore Insourcing'. Excellent stuff.

Categories: Mindless Link Propagation

February 16, 2004

@ 07:17 AM

Comments [12]

I just saw an entry in Ted Leung's blog about SMS messages where he wrote

[via Trevor's ETech notes]

become rude to make a phone call without first checking via sms. [this is becoming more and more the case in europe also]

I would love it if this became the etiquette here in the US as well. For all telephone calls, not just cell calls. People seem to believe that they have the right to call you simply because you have a telephone.

The so-called SMS craze that has hit Europe and Asia seems totally absurd to me. I can understand teenagers and college students using SMS as a more sophisticated way of passing notes to each other in class but can't see any other reason why if I have a device I could use to talk to someone I'd instead send them a hastily written and poorly spelled text message instead. Well maybe if text messages were free and voice calls were fairly expensive but since that isn't the case in the US I guess that's why I don't get it.

Categories: Ramblings

February 15, 2004

@ 07:00 PM

Comments [4]

Combining XPath-based Filtering with Pull-based XML Parsing

Daniel Cazzulino has been writing about his work with XML Streaming Events which combines the ability to do XPath queries with the .NET Framework's forward-only, pull based XML parser. He shows the following code sample

// Setup the namespaces XmlNamespaceManager mgr = new XmlNamespaceManager(temp.NameTable); mgr.AddNamespace("r", RssBanditNamespace); // Precompile the strategy used to match the expression IMatchStrategy st = new RootedPathFactory().Create( "/r:feeds/r:feed/r:stories-recently-viewed/r:story", mgr); int count = 0; // Create the reader. XseReader xr = new XseReader( new XmlTextReader( inputStream ) ); // Add our handler, using the strategy compiled above. xr.AddHandler(st, delegate { count++; }); while (xr.Read()) { } Console.WriteLine("Stories viewed: {0}", count);

I have a couple of questions about his implementation the main one being how it deals with XPath queries such as /r:feeds/r:feed[count(r:stories-recently-viewed)>10]/r:title which can't be done in a forward only manner?

Oleg Tkachenko also pipes in with some opinions about streaming XPath in his post Warriors of the Streaming XPath Order. He writes

I've been playing with such beasts, making all kinds of mistakes and finally I came up with a solution, which I think is good, but I didn't publish it yet. Why? Because I'm tired to publish spoilers :) It's based on "ForwardOnlyXPathNavigator" aka XPathNavigator over XmlReader, Dare is going to write about in MSDN XML Dev Center and I wait till that's published.

May be I'm mistaken, but anyway here is the idea - "ForwardOnlyXPathNavigator" is XPathNavigator implementation over XmlReader, which obviously supports forward-only XPath subset...

And after I played enough with and implemented that stuff I discovered BizTalk 2004 Beta classes contain much better implementation of the same functionality in such gems as XPathReader, XmlTranslatorStream, XmlValidatingStream and XPathMutatorStream. They're amazing classes that enable streaming XML processing in much rich way than trivial XmlReader stack does. I only wonder why they are not in System.Xml v2 ? Is there are any reasons why they are still hidden deeply inside BizTalk 2004 ? Probably I have to evangelize them a bit as I really like this idea.

Actually Oleg is closer and yet farther from the truth than he realizes. Although I wrote about a hypothetical ForwardOnlyXPathNavigator in my article entitled Can One Size Fit All? for XML Journal my planned article which should show up when the MSDN XML Developer Center launches in a month or so won't be using it. Instead it will be based on an XPathReader that is very similar to the one used in BizTalk 2004, in fact it was written by the same guy. The XPathReader works similarly to Daniel Cazzulino's XseReader but uses the XPath subset described in Arpan Desai's Introduction to Sequential XPath paper instead of adding proprietary extensions to XPath as Daniel's does.

When the article describing the XPathReader is done it will provide source and if there is interest I'll create a GotDotNet Workspace for the project although it is unlikely I nor the dev who originally wrote the code will have time to maintain it.

Categories: XML

February 15, 2004

@ 05:50 PM

On Semantic Integration and XML

A few months ago I attended XML 2003 where I first learned about Semantic Integration which is the buzzword term for mapping data from one schema to another with a heavy focus on using Semantic Web technologies such as ontologies and the like. The problem that these technologies solve is enabling one to map XML data from external sources to a form that is compatible with how an application or business entity manipulates them internally.

For example, in RSS Bandit we treat feeds in memory and on disk as if they are in the RSS 2.0 format even though it supports other flavors of RSS as well such as RSS 1.0. Proponents of semantic integration technologies would suggest using a technology such as the W3C's OWL Web Ontology Language. If you are unfamiliar with ontolgies and how they apply to XML a good place to understand what they are useful for is taking a look at the OWL Web Ontology Language Use Cases and Requirements. The following quote from the OWL Use Cases document gives a glimpse into what the goal of ontology languages

In order to allow more intelligent syndication, web portals can define an ontology for the community. This ontology can provide a terminology for describing content and axioms that define terms using other terms from the ontology. For example, an ontology might include terminology such as "journal paper," "publication," "person," and "author." This ontology could include definitions that state things such as "all journal papers are publications" or "the authors of all publications are people." When combined with facts, these definitions allow other facts that are necessarily true to be inferred. These inferences can, in turn, allow users to obtain search results from the portal that are impossible to obtain from conventional retrieval systems

Although the above example talks about search engines it is clear that one can also use this for data integration. In the example of RSS Bandit, one could create an ontology that maps the terms in RSS 1.0 to those in RSS 2.0 and make statements such as

RSS 1.0's <title> element sameAs RSS 2.0's <title> element

Basically, one could imagine schemas for RSS 1.0 and RSS 2.0 represented as two trees and an ontology a way of drawing connections between the leaves and branches of the trees. In a previous post entitled More on RDF, The Semantic Web and Perpetual Motion Machines I questioned how useful this actually would be in the real world by pointing out the dc:date vs. pubDate problem in RSS. I wrote

However there are further drawbacks to using the semantics based approach than using the XML-based syntactic approach. In certain cases, where the mapping isn't merely a case of showing equivalencies between the semantics of similarly structured elemebts (e.g. the equivalent of element renaming such as stating that a url and link element are equivalent) an ontology language is insufficient and a Turing complete transformation language like XSLT is not. A good example of this is another example from RSS Bandit. In various RSS 2.0 feeds there are two popular ways to specify the date an item was posted, the first is by using the pubDate element which is described as containing a string in the RFC 822 format while the other is using the dc:date element which is described as containing a string in the ISO 8601 format. Thus even though both elements are semantically equivalent, syntactically they are not. This means that there still needs to be a syntactic transformation applied after the semantic transformation has been applied if one wants an application to treat pubDate and dc:date as equivalent. This means that instead of making one pass with an XSLT stylesheet to perform the transformation in the XML-based solution, two transformation techniques will be needed in the RDF-based solution and it is quite likely that one of them would be XSLT.

Teh above is a simple example, one could imagine more complex examples where the vocabularies to be mapped differ much more syntactically such as

<author>Dare Obasanjo (dareo@example.com)</author> <author> <fname>Dare</fname> <lname>Obasanjo</lname> <email>dareo@example.com</email> </author>

The aformentioned examples point out technical issues with using ontology based techniques for mapping between XML vocabularies but I failed to point out the human problems that tend to show up in the real world. A few months ago I was talking to Chris Lovett about semantic integration and he pointed out that in many cases as applications evolve semantics begin to be assigned to values in often orthogonal ways.

An example of semantics being addd to values again shows up in an example that uses RSS Bandit. A feature of RSS Bandit is that feeds are cached on disk allowing a user to read items that have long since disappeared from the feed. At first we provided the ability for the user to specify how long items should be kept in the cached feed ranging from a day up to a year. We used an element named maxItemAge embedded in the cached feed which contained a serialized instance of the System.Timespan structure. After a while we realized we needed ways to say that for a particular feed use the global default maxItemAge, never cache items for this feed or never expire items for this feed so we used the TimeSpan.MinValue, TimeSpan.Zero, or TimeSpan.MaxValue values of the TimeSpan class respectively.

If another application wanted to consume this data and had a similar notion of 'how long to keep the items in a feed' it couldn't simply map maxItemAge to whatever internal property it used without taking into account the extra semantics embedded in when certain values occur in that element. Overloading the meaning of properties and fields in a database or class is actually fairly commonplace [after all how many different APIs use the occurence of -1 for a value that should typically return a positive number as an error condition?] and something that must also be considered when applying semantic integration technologies to XML.

In conclusion, it is clear that Semantic Web can be used to map between XML vocabularies however in non-trivial situations the extra work that must be layered on top of such approaches tends to favor using XML-centric techniques such as XSLT to map between the vocabularies instead.

Categories: XML

February 15, 2004

@ 03:07 AM

If Only Life Were An Action Movie

Everything's Xen

Just as it looks like my buddy Erik Meijer is done with blogging (despite his short lived guest blogging stint at Lamda the Ultimate) it looks like a couple more of the folks who brought Xen to the world have started blogging. They are

William Adams: Dev Manager for the WebData XML team.
Matt Warren: Formerly a developer on the WebData XML team, now works on the C# team or on the CLR (I can never keep those straight).

Both of them were also influential in the design and implementation of the System.Xml namespace in version 1.0 of the .NET Framework.

Categories: Life in the B0rg Cube | Movie Review

February 14, 2004

@ 09:03 PM

Comments [4]

A couple of days ago I wrote about The war in Iraq and whether the actions of the US administration could be considered a war crime. It seems this struck a nerve with at least one of my readers. In a response to that entry Scott Lare wrote

Today, between Afganistan and Iraq there are approx 50 million people who were previously under regimes of torture who now have a "chance" at freedom. Get a grip on reality! Talk about missing the point and moronism.

I find it interesting that Scott Lare sees the need to put chance in quotes. Now ignoring the fact that these “regimes of torture” were in fact supported by the US when it was politcally expedient the question is whether people's lives are any better in Afghanistan and Iraq now that they live in virtual anarchy as opposed to under oppresive regimes? In a post entitled Women as property and U.S.-funded nation-building he excerpts a New York Times opinion piece which states

Consider these snapshots of the new Afghanistan:

• A 16-year-old girl fled her 85-year-old husband, who married her when she was 9. She was caught and recently sentenced to two and a half years' imprisonment.

• The Afghan Supreme Court has recently banned female singers from appearing on Afghan television, barred married women from attending high school classes and ordered restrictions on the hours when women can travel without a male relative.

• When a man was accused of murder recently, his relatives were obliged to settle the blood debt by handing over two girls, ages 8 and 15, to marry men in the victim's family.

• A woman in Afghanistan now dies in childbirth every 20 minutes, usually without access to even a nurse. A U.N. survey in 2002 found that maternal mortality in the Badakshan region was the highest ever recorded anywhere on earth: a woman there has a 50 percent chance of dying during one of her eight pregnancies.

• In Herat, a major city, women who are found with an unrelated man are detained and subjected to a forced gynecological exam. At last count, according to Human Rights Watch, 10 of these "virginity tests" were being conducted daily.

... Yet now I feel betrayed, as do the Afghans themselves. There was such good will toward us, and such respect for American military power, that with just a hint of follow-through we could have made Afghanistan a shining success and a lever for progress in Pakistan and Central Asia. Instead, we lost interest in Afghanistan and moved on to Iraq.

... Even now, in the new Afghanistan we oversee, they are being kidnapped, raped, married against their will to old men, denied education, subjected to virginity tests and imprisoned in their homes. We failed them.

To people like Scott I'll only say this; life isn't an action movie where you show up, shoot up all the bad guys and everyone lives happily ever after. What has happened in Afghanistan is that the US military has shoot up some bad guys who have now been replaced by a different set of bad guys. Short of colonizing the country and forcing social change there isn't much the US military can do for a lot of people in Afghanistan especially the women. I accept this but it really irritates me when I here people mouth off about how “life is so much better” because the US military dropped some bombs on the “bad guys“.

As for Iraq, John Robb has a link to an interesting article on the current state of affairs. He writes

Debka has some interesting analysis that indicates that the US is in a bind. The recent moves to empower Iraqi defense forces to take control of city centers is premature (as proved in the brazen attack in Fallujah yesterday). At the same time the US is committed to a shift of power this summer and the UN is talking about elections this fall. There are three potential outcomes for this:

A full civil war that draws in adjacent powers.
Democracy and stability under Sunni leadership.
More US occupation but with increasing resistance.

How would you assign the odds (in percentages) for each outcome?

Considering the animosity between the various factions in Iraq, democracy and stability may not go hand in hand. Being Nigerian I know first hand that democracy doesn't automatically mean stability, I guess that's why some refer to us as The New Pakistan

Categories: Ramblings

February 13, 2004

@ 03:30 PM

Sex, Lies and XML MIME Types

Mark Pilgrim has a post entitled Determining the character encoding of a feed where he does good job of sumarizing what the various specs say about determining the character encoding of an XML document retrieved on the World Wide Web via HTTP. The only problem with his post is that although it is a fairly accurate description of what the specs say it definitely does not reflect reality. Specifically

According to RFC 3023..., if the media type given in the Content-Type HTTP header is text/xml, text/xml-external-parsed-entity, or a subtype like text/AnythingAtAll+xml, then the encoding attribute of the XML declaration within the document is ignored completely, and the encoding is

the encoding given in the charset parameter of the Content-Type HTTP header, or
us-ascii.

So for this to work correctly it means that if the MIME type of an XML document is text/xml then the web server should look inside it before sending it over the wire and send the correct encoding or else the document will be interpreted incorrectly since it is highly likely that us-ascii is not the encoding of the XML document. In practice, most web servers do not do this. I have confirmed this by testing against both IIS and Apache.

Instead what happens is that an XML document is created by the user and dropped on the file system and the web server assumes it is text/xml which it most likely is and sends it as is without setting the charset in the content type header.

A simple way to test this is to go to Rex Swain's HTTP Viewer and download the following documents from the W3 Schools page on XML encodings

All files are sent with a content type of text/xml and no encoding specified in the charset parameter of the Content-Type HTTP header. According to RFC 3023 which Mark Pilgrim quoted in his article that clients should treat them as us-ascii. With the above examples this behavior would be wrong in all four cases.

The moral of this story is if you are writing an application that consumes XML using HTTP you should use the following rule of thumb for the forseeable future [slightly modified from Mark Pilgrim's post]

According to RFC 3023, if the media type given in the Content-Type HTTP header is application/xml, application/xml-dtd, application/xml-external-parsed-entity, or any one of the subtypes of application/xml such as application/atom+xml or application/rss+xml or even application/rdf+xml, text/xml, text/xml-external-parsed-entity, or a subtype like text/AnythingAtAll+xml then the encoding is

the encoding given in the charset parameter of the Content-Type HTTP header, or
the encoding given in the encoding attribute of the XML declaration within the document, or
utf-8.

Some may argue that this discussion isn't relevant for news aggregators because they'll only consume XML documents whose MIME type application/atom+xml or application/rss+xml but again this ignores practice. In practice most web servers send back RSS feeds as text/xml, if you don't believe me test ten RSS feeds chosen at random using Rex Swain's HTTP Viewer and see what MIME type the server claims they are.

Categories: XML

February 12, 2004

@ 11:47 PM

Comments [21]

It Begins

According to Jeremy Zawodney the “My Yahoo's RSS module also groks Atom. It was added last night. It took about a half hour.” Seeing that he said it took only 30 minutes to implement this and there are a couple of things about ATOM that require a little thinking about it even if all you are interested in is titles and dates as My Yahoo! is I decided to give it a try and subscribe to Mark Pilgrim's Atom feed and this is what I ended up being shown in My Yahoo!

dive into mark

The myth of RSS compatibility - 1 week ago
Universal Feed Parser 3.0 beta - 1 day ago
If people won't go to the validator - 2 weeks ago

The first minor issue is that the posts aren't sorted chronologically but that isn't particularly interesting. What is interesting is if you go to the article entitled The myth of RSS compatibility its publication date is said to be “Wednesday, February 4, 2004” which is about a week ago and if you go to the post entitled Universal Feed Parser 3.0 beta its publication date is said to be Wednesday, February 1, 2004 which is almost 2 weeks ago not a day ago like Yahoo! claims.

The simple answer to the confusion can be gleaned from Mark's ATOM feed, that particular entry has a <modified> date of 2004-02-11T16:17:08Z, an <issued> date of 2004-02-01T18:38:15-05:00 and a <created> date of 2004-02-01T23:38:15Z. My Yahoo! is choosing to key the freshness of article of its modified date even though when one gets to the actual content it seems much older.

It is quite interesting to see how just one concept [how old is this article?] can lead to some confusion between the end user of a news aggregator and the content publisher. I also suspect that My Yahoo! could be similarly confused by the various issues with escaping content in Atom when processing titles but since I don't have access to a web server I can't test some of my theories.

I tend to wonder whether the various content producers creating Atom feeds will ditch their feeds for Atom 0.4, Atom 0.5 up until it becomes a final IETF spec or whether they'll keep parallel versions of these feeds so Atom 0.3 continues to live in perpetuity.

It's amazing how geeks can turn the simplest things into such a mess. I'm definitely going to sit it out until the IETF Atom 1.0 syndication format spec before spending any time working on this for RSS Bandit.

Categories: Technology

February 11, 2004

@ 04:02 PM

Comments [5]

What is Metacrap?

One of the big problems with arguing about metadata is that one persons data is another person's metadata. I was reading Joshua Allen's blog post entitled Trolling EFNet, or Promiscuous Memories where he wrote

Some people deride "metacrap" and complain that "nobody will enter all of that metadata". These people display a stunning lack of vision and imagination, and should be pitied. Simply by living their lives, people produce immense amounts of metadata about themselves and their relationships to things, places, and others that can be harvested passively and in a relatively low-tech manner.
Being able to remember what we have experienced is very powerful. Being able to "remember" what other people have experienced is also very powerful. Language improved our ability to share experiences to others, and written language made it possible to communicate experiences beyond the wall of death, but that was just the beginning. How will your life change when you can near-instantly "remember" the relevant experiences of millions of other people and in increasingly richer detail and varied modality?

From my perspective it seems Joshua is confusing data and metadata. If I had a video camera attached to my forehead recording I saw then the actual audiovisual content of the files on my harddrive are the data while the metadata is information such as what date it was, where I was and who I saw. Basically the metadata is the data about data. The interesting thing about metadata is that if we have enough good quality metadata then we can do things like near-instantly "remember" the relevant experiences of ourselves and millions of other people. It won't matter if all my experiences are cataloged and stored on a hard drive if the retrieval process isn't automated (e.g. I can 'search' for experiences by who they were shared with, where they occured or when they occured) as opposed to me having to fast forward through gigabytes of video data. The metadata ideal would be that all this extra, descriptive information would be attached to my audiovisual experiences stored on disk so I could quickly search for “videos from conversations with my boss in October, 2003”.

This is where metacrap comes in. From Cory Doctorow's excellent article entitled Metacrap

A world of exhaustive, reliable metadata would be a utopia. It's also a pipe-dream, founded on self-delusion, nerd hubris and hysterically inflated market opportunities.

This applies to Joshua's vision as well. Data acquisition is easy, anyone can walk around with a camcorder or digital camera today recording everything they can. Effectively tagging the content so it can be categorized in a way you can do interesting things with it search-wise is unfeasible. Cory's article does a lot better job than I can at explaining the many different ways this is unfeasible, cameras with datestamps and built in GPS are just a tip of the iceberg. I can barely remember dates once the event didn't happen in the recent past and wasn't a special occassion. As for built in GPS, until the software is smart enough to convert longitude and latitude coordinates to “that Chuck E Cheese in Redmond“ then they only solve problems for geeks not regular people. I'm sure technology will get better but metacrap is and may always be an insurmountable problem on a global network like the World Wide Web without lots of standardization.

Categories: Technology

February 11, 2004

@ 03:36 PM

Comments [5]

RSS Bandit Nightly Builds

Besides our releases, Torsten packages nightly builds of RSS Bandit for folks who want to try out bleeding edge features or test recent bug fixes without having to set up a CVS client. There is currently a bug pointed out by James Clarke that we think is fixed but would like interested users to test

I'm hitting a problem once every day or so when the refresh thread seems to be exiting - The message refreshing Feeds never goes away and the green download icons for the set remain green forever. No feed errors are generated. Only way out is to quit.

If you've encountered this problem on recent versions of RSS Bandit, try out the RSS Bandit build from February 9^th 2004 and see if that fixes the problem. Once we figure out the root of the problem and fix it there'll be a refresh of the installer with the updated bits.

Categories: RSS Bandit

February 11, 2004

@ 02:51 AM

ATOM: An IETF Future

From Sam Ruby's slides for the O'Reilly Emerging Technology Conference

Where are we going?

A draft charter will be prepared in time to be informally discussed at the IETF meeting is Seoul, Korea on the week of 29 February to 5 March

Hopefully, the Working Group itself will be approved in March

Most of the work will be done on mailing lists

Ideally, a face to face meeting of the Working Group will be scheduled to coincide with the August 1-6 meeting of the IETF in San Diego

Interesting. Taking the spec to IETF implies that Sam thinks it's mostly done. Well, I just hope the IETF's errata process is better than the W3C's.

Categories: Technology

February 10, 2004

@ 05:30 PM

WinFS and Metacrap

Robert Scoble has a post entitled Metadata without filling in forms? It's coming where he writes

Simon Fell read my interview about search trends and says "I still don't get it" about WinFS and metadata. He brings up a good point. If users are going to be forced to fill out metadata forms, like those currently in Office apps, they just won't do it. Fell is absolutely right.But, he assumed that metadata would need to be entered that way for every photo. Let's go a little deeper....OK, I have 7400 photos. I have quite a few of my son. So, let's say there's a new kind of application. It recognizes the faces automatically and puts a square around them. Prompting you to enter just a name. When you do the square changes color from red to green, or just disappears completely.
...
A roadblock to getting that done today is that no one in the industry can get along for enough time to make it possible to put metadata into files the way it needs to be done. Example: look at the social software guys. Friendster doesn't play well with Orkut which doesn't play well with MyWallop, which doesn't play well with Tribe, which doesn't play well with ICQ, which doesn't play well with Outlook. What's the solution? Fix the platform underneath so that developers can put these features in without working with other companies and/or other developers they don't usually work with.

The way WinFS is being pitched by Microsoft folks reminds me a lot of Hailstorm [which is probably unsurprising since a number of Hailstorm folks work on it] in that there are a lot of interesting and useful technical ideas burdened by bad scenarios being hung on them. Before going into the the interesting and useful technical ideas around WinFS I'll start with why I consider the two scenarios mentioned by Scoble as “bad scenarios”.

The thought that if you make the file system a metadata store automatically makes search better is a dubious proposition to swallow when you realize that a number of the searches that people can't do today wouldn't be helped much by more metadata. This isn't to say some searches wouldn't work better (e.g. searching for songs by title or artist), however there are some search scenarios such as searching for a particular image or video from a bunch of image files with generic names or searching for a song by lyrics which simply having the ability to tag media types with metadata doesn't seem like enough. Once your scenarios start having to involve using “face recognition software” or “cameras with GPS coordinates” for a scenario to work then it is hard for people not to scoff. It's like a variation of the popular Slashdot joke

Add metadata search capabilities to file system
???
You can now search for “all pictures taken on Tommy's 5th birthday party at the Chuck E Cheese in Redmond”.

with the ??? in the middle implying a significant dfficulty in going from step 1 to 3.

The other criticism is the fact that Robert's post implies that the reason applications can't talk to each other are technical. This is rarely the case. The main reasons applications don't talk to each other isn't a lack of technology [especially now that we have an well-defined format for exchanging data called XML] but for various social and business reasons. There are no technical reasons MSN Messenger can't talk to ICQ or which prevent Yahoo! Messenger from talking to AOL Instant Messenger. It isn't technical reasons that prevent my data in Orkut from being shared with Friendster or my book & music preferences in Amazon from being shared with other online stores I visit. All of these entities feel they have a competitive advantage in making it hard to migrate from their platforms.

The two things Microsoft needs to do in this space is are to (i) show how & why it is beneficial for different applications to share data locally and (ii) provide guidelines as well as best practices for applications to share data their data in a secure manner.

While talking to Joshua Allen, Dave Winer, Robert Scoble, Lili Cheng, and Curtis Wong yesterday it seemed clear to me that social software [or if you are a business user; groupware that is more individual-focused which gives people more control over content and information sharing] would be a very powerful and useful tool for businesses and end users if built on a platform like Longhorn with a smart data store that know how to create relationships between concepts as well as files (i.e. WinFS) and a flexible, cross platform distributed computing framework (i.e. Indigo).

The WinFS folks and Longhorn evangelists will probably keep focusing on what I have termed “bad scenarios” because they demo well but I suspect that there'll be difficulty getting traction with them in the real world. Of course, I may be wrong and the various people who've expressed incredulity at the current pitches are a vocal minority who'll be proved wrong once others embrace the vision. Either way, I plan to experiment with these ideas once Longhorn starts to beta and seeing where the code takes me.

Categories: Technology

February 10, 2004

@ 05:59 AM

Meeting Dave Winer and Lili Cheng

As Joshua wrote in his blog we had lunch with Dave Winer this afternoon. We talked about the kind of stuff you'd have expected; RSS, ATOM and "social software". An interesting person at lunch was Lili Cheng who's the Group Manager of the Social Computing Group in Microsoft Research*. She was very interested in the technologies around blogging and thought “social software“ could become a big deal if handled correctly. Her group is behind Wallop and I asked if she'd be able to wrangle an invitation so I could check it out. Given my previous negative impressions of Social Software I'm curious to see what the folks at MSR have come up with. She seemed aware of the limitations of the current crop of “social software” that her hip with some members of the blogging crowd so I'd like to see what she thinks they do differently. I think a fun little experiment would be seeing what it would be like to integrate some interaction with “social software“ like Wallop into RSS Bandit. Too bad my free time is so limited.

* So MSFT has a Social Computing Group and Google has Orkut? If I worked at Friendster it would seem the exit strategy is clear, try to get bought by Yahoo! before the VC funds dry up.

Categories: Ramblings

February 9, 2004

@ 03:12 PM

Dealing with Difficulties Using Namespaces in XML

In his blog post entitled Namepaces in Xml - the battle to explain Steven Livingstone wrote

It seems that Namespaces is quickly displacing Xml Schema as the thing people "like to hate" - well at least those that are contacing me now seem to accept Schema as "good".

Now, the concept of namespaces is pretty simple, but because it happens to be used explicitly (and is a more manual process) in Xml people just don't seem to get it. There were two core worries put to me - one calling it "a mess" and the other "a failing". The whole thing centered around having to know what namespaces you were actually using (or were in scope) when selecing given nodes. So in the case of SelectNodes(), you need to have a namespace manager populated with the namespaces you intend to use. In the case of Schema, you generally need to know the targetNamespace of the Schema when working with the XmlValidatingReader. What the guys I spoke with seemed to dislike is that you actually have to know what these namespaces are. Why bother? Don't use namespaces and just do your selects or validation.

Given that I am to some degree responsible for both classes mentioned in the above post, XmlNode (where SelectNodes()comes from) and XmlValidatingReader, I feel compelled to respond.

The SelectNodes() problem is that people would like to perform XPath expressions over nodes and have it not worry about namespaces. For example given XML such as

<root xmlns=”http://www.example.com”>
  <child />
</root>

to perform a SelectNodes() or SelectSingleNode() that returns the <child> element requires the following code

XmlDocument doc = new XmlDocument(); doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable); nsmgr.AddNamespace("foo", "http://www.example.com"); //this is the tricky bit Console.WriteLine(doc.SelectSingleNode("/foo:root/foo:child", nsmgr).OuterXml);

whereas developers don't see why the code isn't something more along the lines of

XmlDocument doc = new XmlDocument(); doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); Console.WriteLine(doc.SelectSingleNode("/root/child").OuterXml);

which would be the case if there were no namespaces in the document.

The reason the latter code sample is not the case is because the select methods on the XmlDocument class are conformant to the W3C XPath 1.0 recommendation which is namespace aware. In XPath, path expressions that match nodes based on their names are called node tests. A node test is a qualified name or QName for short. A QName is syntactically an optional prefix and local name separated by a colon. The prefix is supposed to be mapped to a namespace and is not to be used literally in matching the expression. Specifically the spec states

A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.

There are a number of reasons why this is the case which are best illustrated with an example. Consider the following two XML documents

<root xmlns=“urn:made-up-example“> 
  <child xmlns=”http://www.example.com”/>
</root>

<root>
  <child />
</root>

Should the query /root/child also match the <child> element for the above two documents as it does for the original document in this example? The 3 documents shown [including the first example] are completely different documents and there is no consistent, standards compliant way to match against them using QNames in path expressions without explicitly pairing prefixes with namespaces.

The only way to give people what they want in this case would be to come up with a proprietary version of XPath which was namespace agnostic. We do not plan to do this. However I do have a tip for developers showing how to reduce the amount of code it does take to write the examples. The following code does match the <child> element in all three documents and is fully conformant with the XPath 1.0 recommendation

XmlDocument doc = new XmlDocument(); doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); Console.WriteLine(doc.SelectSingleNode("/*[local-name()='root']/*[local-name()='child']").OuterXml);

Now on to the XmlValidatingReader issue. Assume we are given the following XML instance and schema

<root xmlns="http://www.example.com"> <child /> </root> <xs:schema targetNamespace="http://www.example.com" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:element name="child" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

The instance document can be validated against the schema using the following code

XmlTextReader tr = new XmlTextReader("example.xml"); XmlValidatingReader vr = new XmlValidatingReader(tr); vr.Schemas.Add(null, "example.xsd"); vr.ValidationType = ValidationType.Schema; vr.ValidationEventHandler += new ValidationEventHandler (ValidationHandler); while(vr.Read()){ /* do stuff or do nothing */

As you can see you do not need to know the target namespace of the schema to perform schema validation using the XmlValidatingReader. However many code samples in our SDK to specify the target namespace where I specified null above when adding schemas to the Schemas property of the XmlValidatingReader. When null is specified it indicates that the target namespace should be obtained from the schema. This would have been clearer if we'd had an overload for the Add() method which took only the schema but we didn't. Hindsight is 20/20.

Categories: XML

February 8, 2004

@ 10:15 PM

My Review of ATOM.NET

I noticed Gordon Weakliem reviewed ATOM.NET, an API for parsing and generating ATOM feeds. I went to the ATOM.NET website and decided to take a look at the ATOM.NET documentation. The following comments come from two perspectives, the first is as a developer who'll most likely have to implement something akin to ATOM.NET for RSS Bandit's internal workings and the other is from the perspective of being one of the folks at Microsoft whose job it is to design and critique XML-based APIs.

The AtomWriter class is superflous. The class that only has one method Write(AtomFeed) which makes more sense being on the AtomFeed class since an object should know how to persist itself. This is the model we followed with the XmlDocument class in the .NET Framework which has an overloaded Save() method. The AtomWriter class would be quite useful if it allowed you to perform schema driven generation of an AtomFeed, the same way the XmlWriter class in the .NET Framework is aimed at providing a convenient way to programmatically generate well-formed XML [although it comes close but doesn't fully do this in v1.0 & v1.1 of the .NET Framework]
I have the same feelings about the AtomReader class. This class also seems superflous. The functionality it provides is akin to the overloaded Load() method we have on the XmlDocument class in the .NET Framework. I'd say it makes more sense and is more usable if this functionality was provided as a Load() method on an AtomFeed class than as a separate class unless the AtomReader class actually gets some more functionality.
There's no easy way to serialize an AtomEntry class as XML which means it'll be cumbersome using this ATOM.NET for the ATOM API since it requires sending elements as XML over the wire. I use this functionality all the time in RSS Bandit internally from passing entries as XML for XSLT themes to the CommentAPI to IBlogExtension.
There is no consideration for how to expose extension elements and attributes in ATOM.NET. As far as I'm concerned this is a deal breaker that makes the ATOM.NET useless for aggregator authors since it means they can't handle extensions in ATOM feeds even though they may exist and have already started popping up in various feeds.

Categories: XML

February 8, 2004

@ 09:37 PM

Comments [7]

I'm Touched

Lots of people seem to like the newest version of RSS Bandit. The most recent praise was the following post by Matt Griffith

RSS Bandit is amazing

I've been a Bloglines user for almost a year. I needed a portable aggregator because I use several different computers. Then a few months ago I got a TabletPC. Now portability isn't as critical since I always have my Tablet with me. I stayed with Bloglines though because none of the client-side aggregators I tried before worked for me.

I just downloaded the latest version RSS Bandit. I love it. It is much more polished than it was the last time I tried it. Combine that with the dasBlog integration and the upcoming SIAM support and I'm in hog heaven. Thanks Dare, Torsten, and everyone else that helped make RssBandit what it is.

Also it seems that at least one user liked RSS Bandit so much that he [or she] was motivated to write an article on Getting Started with RSS Bandit. Definitely a good starting point and something I wouldn't mind seeing become part of the official documentation once it's been edited and more details fleshed out.

Sweet.

Categories: RSS Bandit

February 8, 2004

@ 09:21 PM

This Month's Extreme XML Column: Passing XML Data Inside the CLR

A few weeks ago during the follow up to the WinFX review of the System.Xml namespace of the .NET Framework it was pointed out that our team hadn't provided guidelines for exposing and manipulating XML data in applications. At first, I thought the person who brought this up was mistaken but after a cursory search I realized the closest thing that comes to such a set of guidelines is Don Box's MSDN TV episode entitled Passing XML Data Inside the CLR. As good as Don's discussion is, a video stream isn't as accessible as a written article. In tandem with coming up with some of the guidelines for utilizing XML in the .NET Framework for internal purposes I'll put together an article based on Don's MSDN TV episode with an eye towards the next version of the .NET Framework.

If you watched Don's talk and had any questions about it or require any clarifications respond below so I can clarify them in the article I plan to write.

Categories: XML

February 8, 2004

@ 08:59 PM

Dave Winer at Microsoft

Dave Winer is going to giving a talk at Microsoft Research tomorrow. Robert Scoble has is organizing a lunch before the talk with some folks at MSFT and Dave. I may or may not make it since my mom's visiting from Nigeria and I was planning to take most of the week off. Just in case, I miss it there is one thing I'd like Dave to know; most of the problems in the XML-based website syndication space could have been solved if he didn't act as if once he wrote a spec or code for the Radio Userland aggregator then it was impossible to change. Most of the supposed “problems” with RSS would take 30 minutes to fix in the spec and about a day to fix in the Radio Userland codebase (I'm making assumptions here based on how long it would take in the RSS Bandit codebase). Instead he stonewalled and now we have the ATOM mess. Of course, we'd still need something like the ATOM effort to bring the blogging APIs into the 21^st century but we wouldn't have to deal with incompatibilities at the website syndication level as well.

In a recent blog post Dave mentions that his MSR talk will mainly be about the themes from his article Howard Dean is not a soap bar. I don't really have an opinion on the content one way or the other but I did dislike the way he applies selective memory to prove a point specifically

In the lead-up to the war in Iraq, for some reason, people who were against the war didn't speak.

Maybe they didn't speak on the East coast but there was a very active anti-War movement on the West coast especially in the Seattle area. Actually they did speak out on the East Coast as well, in fact hundreds of thousands of voices all over the US and all over the world spoke out.

It's makes me view the “blogs are the second coming” hype with suspicion when it's boosters play fast and loose with the facts to sell their vision.

Categories: Life in the B0rg Cube

February 8, 2004

@ 06:13 PM

Overlooking the Real Issue in the Janet Jackson Superbowl Fiasco

I've seen a lot of the hubbub about Janet Jackson's "costume reveal" at the Superbowl and tend to agree with Dan Gillmor that was just one in a series of classless and crass things about the Superbowl. However I've learned something new about Janet Jackson I didn't before this incident, she's dating Jermaine Dupri. Now I'm not one to knock someone's lifestyle choices but come on, Jermaine Dupri? That's almost as stunning as finding out that Whitney Houston ended up with Bobby “3 baby mamas” Brown.

Categories: Ramblings

February 8, 2004

@ 06:02 PM

MSN Search beta

I stumbled on a link to MSN Search beta site while reading about Google cancelling its Spring IPO. What I find particularly interesting is that it seems to use very similar algorithms to Google given that it falls for Google Bombs as well. For example, check out the results for miserable failure and litigious bastards. In fact, the results are so similar at first I thought it was a spoof site that was just calling out to Google in the background.

Categories: Mindless Link Propagation

February 7, 2004

@ 11:28 PM

Writing is Therapeutic

Yesterday I wrote an entry about the fact that given Microsoft is such a big player in the software market there is the perception [whether correct or incorrect] that once Microsoft enters a technology or product space then smaller players in the market will lose out as customers migrate or Microsoft outmarkets/outspends/outcompetes them. The post also dwelled on a related topic, the perception that Microsoft is fond of vaporware announcements to hinder progress in prospective markets.

After writing the post, I deleted it after it had been on my blog for about 5 minutes. I wasn't happy with the quality of the writing and didn't feel I properly expressed my thoughts. However just the process of writing stuff down made me feel better. Having seen the effects of the Microsoft entering smaller markets on existing participants at a personal level (at least one person has claimed that the fact that I created EXSLT.NET lost him business) as well as at the product unit level (the various technologies the SQL Server Product Unit comes up with from ADO.NET to SQL Server to core XML technologies) there were various thoughts bubbling within me and writing them down helped me understand then come to grips with them.

I definitely need to get a personal journal. I'd have loved to read that post one, five and ten years from now. However it wasn't really for public consumption.

Categories: Life in the B0rg Cube

February 7, 2004

@ 11:10 PM

Two Killer RSS Bandit Features

Autodiscovering feeds as you browse the Web. Every link to a feed found on the web pages you browse to from RSS Bandit are available in a handy drop down
>
Unread items folder

Categories: RSS Bandit

February 7, 2004

@ 11:00 PM

Comments [5]

RSS Bandit v1.2.0.90 Released

This is a bugfix release. Differences between v1.2.0.89 and v1.2.0.90 below.

FIXED: RSS Bandit icons don't show up in shortucts on desktop or in start menu.
FIXED: Search Folders now can be saved also without specifying a search expression (e.g. Unread items only).
FIXED: Posts in Unread item folders look like gibberish.
FIXED: RSS Bandit no longer tries to convert HTML in feeds to XHTML. This means a large number of feed errors about indeclared namespaces and the like should no longer appear.
FIXED: The [Next Unread Item] button now iterates through posts in selected Search folder, if there are unread items.
FIXED: Sometimes an exception is thrown if [Next Unread Item] button pressed while comments for an item are being downloaded
FIXED: Tree view flickers when application is loaded

Categories: RSS Bandit

February 6, 2004

@ 05:00 PM

Comments [13]

XML 1.1: The W3C Gets It Wrong

A few days ago XML 1.1 became an official W3C recommendation. Mark Pilgrim, contrary to W3C guidelines, has celebrated by converting his RSS feed to XML 1.1 which means it currently cannot be processed by any Microsoft XML technologies from the XML parsers in the .NET Framework to MSXML which is used in a host of products from Internet Explorer to Office 2003.

This is the first step in fragmenting the interoperability on the Web gained by XML. It seems the next step will be W3C sanctioned binary XML. Anyway let's get back to XML 1.1. What exactly is wrong with it one might ask? The biggest thing wrong with it is that it is backwards incompatible with XML 1.0. A good summary of all the things you need to know about XML 1.1 is covered in Chapter 3 of Elliote Rusty Harrold's Effective XML

Everything you need to know about XML 1.1 can be summed up in two rules:

Don't use it.

(For experts only) If you speak Mongolian, Yi, Cambodian, Amharic, Dhivehi, Burmese or a very few other languages and you want to write your markup (not your text but your markup) in these languages, then you can set the version attribute of the XML declaration to 1.1. Otherwise, refer to rule 1.

XML 1.1 does several things, one of them marginally useful to a few developers, the rest actively harmful.

It expands the set of characters allowed as name characters

The C0 control characters (except for NUL) such as form feed, vertical tab, BEL, and DC1 through DC4 are now allowed in XML text provided they are escaped as character references.

C1 control characters (except for NEL) must now be escaped as character references

NEL can be used in XML documents, but is resolved to a line feed on parsing.

Parsers may (but do not have to) tell client applications that Unicode data was not normalized

Namespace prefixes can be undeclared

XML is a lousy format for most of the things it is used for. The one benefit it has is that it is widely supported and a guaranteed way to interoperate in a cross-platform manner. By tampering with this the W3C is effectively diluting one of the few benefits of using XML. This is an regrettable occurence. Unfortunately it looks like things will get worse now that the W3C also wants to dabble in “binary XML”.

Categories: XML

February 5, 2004

@ 10:00 PM

rssbandit-users Mailing List Launched

A mailing list dedicated to discussing RSS Bandit is now available. Details on how to susbscribe to the mailing list or view its archives are available here.

Also thanks to all the folks that sent in feature requests. Torsten and I will sift through the two dozen responses, extract the various feature requests, discard some, prioritize the rest, enter them in the RSS Bandit feature request database then get coding.

Categories: RSS Bandit

February 4, 2004

@ 04:55 PM

Comments [8]

Which American Presidential Candidate Best Aligns With Your Views?

I just tried out the President Match tool which tries to figure pick which US presidential candidate shares your views based on a poll and my results were

Kucinich 100%
Kerry 96%
Sharpton 90%
Edwards 90%
Dean 89%
Clark 87%
Liberman 83%
Bush 37%

I'm unsurprised by how far away George Bush is from representing my perspectives but I am surprised that Kerry comes so close. I'm definitely going to do some research into his position on certain issues. I had an ex-grilfriend call me a long time ago gushing about Kucinich who told me I'd love his campaign if I ever looked into it, I guess she was right.

Categories: Mindless Link Propagation

February 4, 2004

@ 05:52 AM

War Crimes and Iraq

Based on recent reports it looks like Colin Powell is practically admitting there were no Weapons of Mass Destruction in pre-war Iraq although this was the primary justification for the US invading the country in an action that has left an estimated 8000 civilians dead. That's quite a number who've lost their lives over a clerical error which happens to have spiralled the US deficit and made a few defence contractors richer.

I was curious as to whether I could look up the definition of “war crime” and see if starting a war for bogus reasons qualifies. My search lead me to the Crimes of War project and an article entitled Who Owns the Rules of War? which had these interesting paragraphs

The enduring law established at Nuremberg has thus turned out not to be the ''crime of aggression'' but a reaffirmation of war crimes as traditionally understood -- with two important innovations made necessary by the Nazi death camps: genocide and crimes against humanity. Nuremberg also had serious gaps. Most significant, it failed to address the terror bombing of civilians and the deliberate consuming of whole cities (Dresden, Tokyo) by fire -- the most enthusiastic practitioners of the latter being the Allies.

The failure to prosecute the Allies for firebombing cities is one of the strongest arguments today for why war-crimes tribunals should not be conducted by the victors. Many regard this argument as so clinching, in fact, that the mere charge of ''victor's justice'' is enough to end debate.

That clarified things for me. The definition and prosecution of a “war crime” is really up to the victors so the answer to my question is that it is highly unlikely that the events leading up to the debacle in Iraq will ever be considered a “war crime”. The rest of the article goes further in convincing me that the term “war crime” is a mostly meaningless phrase which has never been uniformly applied and for which there are very few if any useful metrics.

Categories: Ramblings

February 4, 2004

@ 04:20 AM

Comments [12]

NNTP and RSS Bandit

Every once in a while someone asks whether I plan to add support for reading newsgroups via NNTP to RSS Bandit. I believe I've reached the point where I'm bored enough to actually give it a shot and since there don't seem to be any freely available libraries that provide this functionality for .NET and have licencing requirements I'm comfortable with I may have to write the NNTP code myself.

The only question is whether this is actually functionality people are interested in or not. I definitely would use it since I monitor newsgroups like microsoft.public.dotnet.xml and microsoft.public.xml but would the average RSS Bandit user think it was a worthwhile feature or would my time be better spent elsewhere?

Categories: RSS Bandit

February 3, 2004

@ 05:36 PM

Comments [26]

RSS Bandit TODO List

Below is the list of features I want to add over the coming months in order of priority. If you're interested in RSS Bandit's development post a response with the list of features you would like to see in order of priority.

Support synchronizing RSS Bandit state accros multiple machines using SIAM.
Experiment with ways to improve performance like removing dependence on SgmlReader and working on multithreading issues..
Work on getting localized versions of RSS Bandit for various languages. Need to recruit translators.
Figure out how to locate interesting content. Perhaps via Technorati integration?
Support blog posting using ATOM API.
Support ATOM syndication format.

I'm really running out of ideas of features to add to RSS Bandit. It seems we already have more features than most other Windows desktop aggregators and there is only so much more we can add. Torsten is still looking at doing weird, wild, wonderful stuff like seeing what it would look like to add ThreadArcs to RSS Bandit.

Categories: RSS Bandit

February 3, 2004

@ 04:56 PM

Rule Based XML Validation, XML Web Services and Service Oriented Architectures

In his post entitled Business Rules, OCL, XML and Schemas Daniel Cazzulino writes

DonXML is proposing extensions to OCL to express business rules that can be used at code-gen time and at run-time. He mentions my Schematron implementation called Schematron.NET, which allows many business rules to be expressed simply in terms of standard XPath expressions. I believe such an XPath-based language is good enough to express almost every business rule.

Udi Dahan commented as an example, a rule "only a bank manager can authorize a loan above X" which he said couldn't be expressed with Don's idea. It could, indeed, with something along these lines (XPath-like):
<assert test="sec:principal-role('BankManager') and po:Loan/@Amount < 1000"> Only a BankManager can place a loan of more than $1000. </assert>

Using rules-based XML validation is a good way to augment the capabilities of the W3C XML Schema language which is traditionally used to describe message structures in SOAP-based XML Web Services. In the post on Daniel's blog Udi Dahan asks

I like the technique. I'm still puzzling over the strategy. From a SOA approach, where does this go ? What makes it different/better than any other rules engine ? You've given me something to think about. Thank you.

In an SOA approach the rules are part of the message contract. A service endpoint can accept certain kinds of messages that satisfy its message contract. Using a rule-based language like Schematron just makes for writing a tighter contract than one could write using a traditional XML schema language like XSD.

In fact, Aaron Skonnard wrote an article on MSDN entitled Extend the ASP.NET WebMethod Framework by Adding XML Schema Validation that introduced this to some degree which he followed up with two episodes of MSDN TV; Validating Business Rules with XPath Assertions, Pt. 1 and Validating Business Rules with XPath Assertions, Pt. 2

Categories: XML

February 3, 2004

@ 07:21 AM

Social Software and Sharing Too Much Information

I've been trying out Orkut some more and I'm now pretty sure I think it is lame. There is the problem I mentioned previously in that it doesn't provide a way to create a hierarchy of friendships (i.e. differentiate friends from acquintances, business partners from co-workers, etc) which by the way Don Park has an interesting solution for called Friendship Circles. The other reason I've decided it rubs me wrong is that it tends to encourage the collapsing of the various facets of a person's s social life as pointed out by Warren Ellis along with other criticisms which I agree with. Warren Ellis wrote

Right now, it looks pretty much like an iteration of the Tribe.net system, with an eye on Friendster's apparent main function as a dating system. (Which means, oddly, it requests your business profile at the same time as it's asking you where you like to be fingered.) (Okay, maybe not.)...

My current list of friends is mostly folks I know through geeking at work or on the Internet. Some I'd call friends and some I'd call acquaintances. Particularly interesting to me is the stark contrast that would show up if I actually had some of the folks I actually consider my close friends up there next to folks who's primary connection to me is work or being subscribed to the same mailing lists. It would be folks with completely different, contrasting sets of people.

However this isn't what I found interesting. I noticed that folks could form groups or communities on Orkut about specific topics and one of the ones I found by exploring the various friend-of-a-friend links was the Legalize Marijuana community. Considering that the various links I followed were mostly professional relationships I thought it was particularly bold and mayhap foolhardy for folks to do the equivalent of labelling themselves as drug users or at least “pro-drug”. I find this aspect of social software fascinating. I have already begun to notice how a blog collapses the various facets of one's character as one tries to serve different audiences including from friends & family to co-workers & customers. Adding to this delicate dance by exposing ones relationships from the mundane & innocent to the illicit & illegal to all and sundry including your boss, co-workers, business partners and any random person with an Orkut account is probably more than I can stomach. That doesn't change the fact that there is somewhat of a voyeuristic thrill navigating some of these relationships. I just wonder how many private and business relationships have been or will be started or ended on the strength of some of the things discovered by navigating the various friend-of-a-friend links between various individuals.

By the way, the rest of Warren Ellis's criticisms of Orkut are also ones I share so I'm including them below instead of repeating them myself in poorer prose

It's coping pretty well as it starts taking the weight of several thousand early explorers. Most of whom, if they follow the accelerating process that's left Friendster a relative wasteland and given Tribe a bit of an echo, will be out of there again in a few weeks. It's faster than Fuckster and Tribe, but it shows that all these friend-of-a-friend things have really hit a wall. I mean, what can you actually do aside from invite all your friends and piss about on a couple of small message boards? Message boards that, unlike Tribe, allow anonymous postings and therefore devalue the message board experience? What happens after that? After you've gotten all your friends in -- whom you send email to or IM regularly in any case, presumably. That's it. All done. Until, I guess, yet another social network system opens and you start all over again. These things want to be a hub for your Internet community experience, but they're just not necessary enough. Tribe gets closest, but it's nothing you're going to leave as an open window on your desktop all day. The first new social network system that builds an IM program into its structure may have a shot...
And that has to be their goal. I mean, who builds a social network system that doesn't want people to use it all the time?

If services like Orkut and Friendster were part of portals I was already using such as Yahoo! or MSN then I'd probably stick around but as standalone sites they just don't make much sense. Maybe part of their goal is to get bought by bigger companies who hopefully can figure out what to do with them [which seems to be the case with Orkut] in which case it looks like the dot bomb era isn't quite dead yet.

Categories: Ramblings

February 3, 2004

@ 06:22 AM

Mondays

My work machine has a toasted harddrive, my TiVo's hard drive is also toasted which will cost $100 to get replaced under warranty, my cable splitter is busted so I can either watch cable TV or use the Internet but not both, I've had to manage the fact that first feature I designed from scratch for the next version of System.Xml was optimized for the wrong scenarios and should probably be pulled from beta 1 of the .NET Framework, and I just found out the Deli next door simultaneously stopped carrying both Mike's Hard Iced Tea and Bacardi Silver O³since it looks like I was the primary customer buying either beverage.

I fucking hate Mondays.

Categories: Ramblings

February 1, 2004

@ 06:47 PM