December 16, 2003
@ 05:33 PM

The XML 2003 conference was a very interesting experience. Compared to the talks at XML 2002 I found the talks at XML 2003 to be of more interest and relevance to me as an developer building applications that utilize XML. The various hallway and lunchtime conversations I had with  various people were particularly valueable. Below are the highlights from the various conversations I had with some XML luminaries at lunch and over drinks. Tomorrow I'll post about the various talks I attended.

CONVERSATIONS
James Clark: He gave two excellent presentations, one on his Namespace Routing Language (NRL) and the other about some of implementation techniques used in his nxml-mode for Emacs. I asked whether the fact that he gave no talks about RELAX NG meant that he was no longer interested in the technology. He responded that there wasn't really anything more to do with the language besides shepherd it through the standardization process and evangelization. However given how entrenched support for W3C XML Schema was with major vendors evangelization was an uphill battle.

I pointed out that at Microsoft we use XML schema language technologies for two things;

    1. Describing and enforcing the contract between producers and consumers of XML documents: .
    2. Creating the basis for processing and storing typed data represented as XML documents:

The only widely used XML Schema language that fit the bill for both tasks is W3C XML Schema. However W3C XML Schema is too complex and yet doesn't have enough features for the former and has too many features which introduce complexity for the latter case. In my ideal world, people would use something like RELAX NG for the former and XML-Data Reduced (XDR) for the latter. James asked if I saw value in creating a subset of RELAX NG which also satisfied the latter case but I didn't think that there would be compelling argument for people who've already baked W3C XML Schema into the core of their being (e.g. XQuery, XML Web Services, etc) to find interest in such a subset.

In fact, I pointed out that in designing for Whidbey (next version of the .NET Framework) we originally had designed the architecture to have a pluggable XML type system so that one could potentially generate Post Schema Validation Infosets (PSVI) but realized that this was a case of YAGNI. First of all, only one XML schema language exists that can generate PSVIs so creating a generic architecture makes no sense if there was no other XML schema language that could be plugged in to replace W3C XML Schema. Secondly, one of the major benefits of this approach I had envisioned was that one would be able to plug their own type systems into XQuery. This turned out to be more complicated than I thought because XQuery has W3C XML Schema deeply baked into it and it would take more than genericizing at the PSVI level to make it work (we'd also have to genericize operators, type promotion rules, etc) and once then once all that effort would have been expended any language that could be plugged in would have to act a lot like W3C XML Schema anyway.  Basically if some RELAX NG subset suddenly came into existence, it wouldn't add much to that we don't already get from W3C XML Schema (except less complexity but you could get the same from coming up with a subset of W3C XML Schema or following my various W3C XML Schema Best Practices articles on XML.com).

I did think that there would be some value to developers building applications on Microsoft platforms who needed more document validation features than W3C XML Schema in having access to RELAX NG tools. This would be nice to have but isn't a showstopper preventing development of XML applications on Microsoft platforms (translation: Microsoft won't be building such tools in the forseeable future). However if such tools existed I definitely would evangelize them to our users who needed more features than W3C XML Schema provides for their document validation needs.  

Sam Ruby: I learned that Sam is on one of "emerging technologies" groups at IBM. Basically he works on stuff that's about to become mainstream in big way and helps them along the way. In the past this has included PHP, Open Source and Java (i.e. the Apache project), XML Web Services and now weblogging technologies. Given his track record I asked him to give me a buzz whenever he finds some new technology to work on. : )

I told him that I felt syndication formats weren't the problem with weblogging technologies and he seemed to agree but pointed out that some of the problems they are trying to solve with ATOM make more sense in the context of using the same format for your blog editing/management API and archival format. There were also the various interpersonal conflicts & psychological baggage which needs to be discarded to move the technology forward and a clean break seems to be the best way. On reflection, I agreed with him.

I did point out that the top 3 problems I'd like to fix in syndication were one click subscription, subscription harmonization and adding calendar events to feeds. I mentioned that I should have RFCs for the first two written up over the holidays but the third is something I haven't thought about hard. Sam pointed out that instead of going the route of coming up with a namespaced extension element to describe calendar events in an RSS feed that perhaps a better option is the ATOM approach that uses link tags. Something like

   <link type="text/calendar" href="...">

In fact he seemed to have liked this idea so much it ended up in his presentation.

As Sam and I were finishing our meals, Sam talked about the fact that the effect that blogging has had on his visibility is immense. Before blogging he was well known in tight-knit technical circles such as amongst the members of the Apache project but now he knows people from all over the world working at diverse companies and regularly has people go "Wow, you're Sam Ruby, I read your blog". As he said, this the guy sitting across from us at the table said "Wow, you're Sam Ruby, I read your blog", Sam turned to me and said "See what I mean?"

The power of blogging...

Eve Maler: I spoke to her about a talk I'd seen on UBL given by Eduardo Gutentag and Arofan Gregory where they talked about the benefits of using the polymorphic features of W3C XML Schema to good use in business applications. The specific scenario they described was the following

Imagine a small glue supplier that provides glue to various diverse companies such as a shoe manufacturer, an automobile manufacturer and an office supplies company. This company uses UBL to talk to each of its customers who also use UBL but since the types for describing purchase orders and the like are not specific enough for them they use the type derivation features of W3C XML Schema to create specific types (e.g. a hypothetical LineItem type from UBL is derived to AutomobilePart or ShoeComponent by the various companies). However the small glue company can handle all the new types with the same code if they use type aware processing such as the following path XPath 2.0 or XQuery expression  which matches all instances of the LineItem type

element(*, LineItem)

The presenters then pointed out  that there could be data loss if one of the customers extended the LineItem type by adding information that was pertinent to their business (e.g. priority, pricing information, prefeerred delivery options, etc) since such code would not know about the extensions.

This seems like a horrible idea and yet another reason why I view all the "object oriented" features of W3C XML Schema with suspicion.

Eve agreed that it probably was a bad idea to recommend that people process XML documents this way then stated that she felt that calling such processing "polymorphic" didn't sit right with her since true polymorphism doesn't require subtype relationships. I agreed and disagreed with her. There are at least four types of polymorphism in programming language parlance and the kind used above is subtype polymorphism. This is just one of the four types of polymorphism (the others being coercion, overloading and parametric polymorphism) but the behavior above is polymorphism. From talking to Eve it seemed that she was more interested in parametric polymorphism because it subtype polymorphism is not a loosely coupled approach. I pointed out that just using XPath expressions to match on predicates could be considered to be parametric polymorphism since you are treating instances similarly even though they are of different types but satisfy the same constraints. I'm not sure she agreed with me. :)    

Jon Udell: We discussed the online exchange we had about WinFS types and W3C XML Schema types. He apologized if he seemed to be coming on too strong in his posts and I responded that of the hundreds of articles and blog posts I'd read about the technologies unveiled at the recent Microsoft Professional Developer's Conference (PDC) that I'd only seen two people provide insightful feedback; his was the first and Miguel de Icaza's PDC writeup was the second. 

Jon felt that WinFS would be more valuable as an XML database as opposed to an object oriented database (I think the terms he used were "XML store" and "CLR store") especially given his belief that XML enables the "Universal Canvas". I agreed with him but pointed out that Microsoft isn't a single entity and even though some parts may think that XML is one step closer to giving us a universal data interchange format and thus universal data access which there are others who see XML as "that format you use for config files" and express incredulity when they here about things like XQuery because they wonder why anyone would need a query language for their config files. :)

Reading Jon's blog post about Word 11, XML and the Universal Canvas it seems he's been anticipating a unified XML storage model for a while which explains his disappointment that the WinFS unveiled at PDC was not it.

He also thought that the fact that so many people at Microsoft were blogging was fantastic. 


 

Categories: XML

December 16, 2003
@ 06:52 AM

Robert Scoble writes

Here's what I'd do if I were at Harvard and in charge of the RSS spec:

1) Announce there will be an RSS 3.0 and that it will be the most thought-out syndication specification ever.

2) Announce that RSS 3.0 will ship on July 1, 2005. That date is important. For one, 18 months is long enough to really do some serious work. For two, RSS 3.0 should be positioned as "the best way to do syndication on Microsoft's Longhorn." ...

3) Open up a mailing list, a wiki, and a weblog to track progress on RSS 3.0 and encourage community inclusion.

4) Work with Microsoft to ensure that RSS 3.0 will be able to take advantage of Longhorn's new capabilities (in specific, focus on learning Indigo and WinFS)...

5) Make sure RSS 3.0 is simply the best-of-breed syndication protocol. Translation: don't let Microsoft or Google come up with a better spec that has more features.

I'm terribly amused by the fact that Robert Scoble likes to claim that he doesn't represent Microsoft in his blog then posts items where he basically acts like he does. An RSS 3.0 that competes with Atom is probably the worst possible proposal to resolve the current conflict in the website syndication space and a very clear indication that this is all about personality conflicts. The problem  with the Atom syndication format is that it is an incompatible alternative of RSS 1.0/RSS 2.0 which provides little if any benefit to content producers or news aggregators consumers. Coming up with another version of RSS doesn't change this fact unless it is backwards compatible and even then besides clarifications to the original spec I'm unsure what could be added to the core although I can think of a number of potential candidates. However this still would be a solution looking for a problem.

While talking to Tim Bray and Sam Ruby at XML 2003 last week I stated that a number of the problems with syndication have little to do with the core spec and most aggregator authors wouldn't consider any of the problems harped upon on the Atom lists as a big deal. The major problems with syndication today have little to do with the syndication format and more to do with it's associated technologies. 

As little interest I have in an Atom syndication format I have an order of magnitude less interest in a new version of RSS that exists solely to compete with Atom..

PS: Am I the only one who caught the trademark Microsoft arrogance (which really comes from working on Windows[0]) in Scoble's post? I especially liked

"Here's what I'd do if I were at Harvard and in charge of the RSS spec...Work with Microsoft to ensure that RSS 3.0 will be able to take advantage of Longhorn's new capabilities (in specific, focus on learning Indigo and WinFS). Build a prototype (er, have MSN build one) that would demonstrate some of the features of RSS 3.0 -- make this prototype so killer that it gets used on stage at the Longhorn launch

I literally guffawed out loud. So if Harvard doesn't tie RSS to Windows then all is lost? I guess this means that NetNewsWire and Straw should get ready to be left behind in the new Microsoft-controlled RSS future. Hilarious. 

[0] When you work on the most popular piece of software in the world you tend to have a different perspective from most other software developers in the world including within Microsoft.


 

Categories: Ramblings

December 15, 2003
@ 05:04 PM

James Robertson writes

Ed Foster points out that MS - like many other vendors - is forbidding benchmarks as part of their standard contracts:

Is it possible Microsoft has something to hide about the performance of its server and developer products? It's hard to escape that conclusion when you see how many of its license agreements now contain language forbidding customers of those products from disclosing benchmark results.

...
So what are MS and the other vendors afraid of?

I'm not sure what the official line is on these contracts but I've come to realize why the practice is popular among software vendors. A lot of the time people who perform benchmarks are familiar with one or two of the products they are testing and know how to tune those for optimal performance but not the others which leads to skewed results. I know that at least on the XML team at Microsoft we don't block people from publishing benchmarks if they come to us, we just ensure that their tests are apples-to-apples comparisons and not unfairly skewed to favor the other product(s) being tested.

Just a few days ago I attended a session at XML 2003 entitled A Comparison of XML Processing in .NET and J2EE where the speaker stated that push based XML parsers like SAX was more performant than pull-based XML parsers like the .NET Framework's XmlReader when dealing with large documents. He didn't give any details and implied that they were lacking because of the aforementioned EULA clauses.  Without any details, sample code or definition of what document size is considered "large" (1MB, 10MB, 100MB, 1GB?)  it's difficult to agree or disagree with his statement. Off the top of my head there aren't any inherrent limitations of pull-based XML parsing that come to mind that should make it perform less than push based parsing of XML documents although differences in implementations makes all the difference. I suspect that occurences like this are why many software  vendors tend to have clauses that limit the disclosure of benchmark information in their EULAs.

Disclaimer: The above is my personal opinion and is in no way, shape or form an official statement of the position of my employer.


 

Categories: Life in the B0rg Cube

I'm now experimenting with various Windows CVS clients to see which best suits my needs for RSS Bandit development. So far I have tried WinCVS which seems OK and I'm about to evaluate Tortoise CVS which Torsten seems very happy with.

Later on I'll experiment with CVS plugins that are integrated into Visual Studio such as Jalindi Igloo or the SCC Plugin for Tortoise CVS. I never got to use the Visual Studio plugin for GotDotNet workspaces when RSS Bandit was hosted there because the original IDE I started developing RSS Bandit (Visual C# 2002 standard edition) with did not support said plugin so I am curious as to what development with source repository access as part of the IDE feels like.


 

Categories: RSS Bandit

December 15, 2003
@ 04:32 PM

It seems my recent post about moving the RSS Bandit from GotDotNet Workspaces to SourceForge has lead to some discussion about the motivations for the move. I've seen this question asked on Daniel Cazzulino's weblog and on the RSS Bandit message board on GotDotNet. Below is my answer to the question phrased in the form of a top 10 list which I posted in response to the question on the GotDotNet message board and also sent to Andy Oakley

Top 10 reasons why we moved to SourceForge

1. Doesn't require people have Passport accounts to download the RSS Bandit installer.

2. We get download and page load statistics.

3. Bug reports can have file attachments. This is great since a lot of the time we end up wishing people would attach their error.log or feedlist.xml file with their bug reports.

4. We can get a mailing list if we want.

5. Separate databases for features vs. bugs.

6. Source code can be browsed over HTTP via ViewCVS without having to install any software

7. Larger quotas on how much you can store on their servers.

8. Bug tracker remembers your queries and the default query is more useful to me (all open bugs) than GDN's (all bugs assigned to me even closed ones).

9. Activity score more accurately reflects activity of the project (on GDN, BlogX is scored at having 99% activity score even though the project has been dead for all intents and purposes for several months).

10. With SourceForge we get to use the BSD licence.

I hope this satisfies the curiosity of those wondering why RSS Bandit moved to SourceForge. I've been using it for a few days and I'm already much happier with it despite some initial teething problems getting adding modules to CVS.


 

Categories: RSS Bandit

According to Reuters

WASHINGTON (Reuters) - A Pentagon (news - web sites) audit of Halliburton, the oil services firm once run by Vice President Dick Cheney (news - web sites), found the company may have overbilled the U.S. government by more than $120 million on Iraq (news - web sites) contracts, U.S. defense officials said on Thursday.

Why am I not surprised? This entire Iraq war fiasco will be the subject of much consternation and entertainment to future generations.


 

December 12, 2003
@ 12:19 PM

Today is the last day of the XML 2003 conference. So far it's been a pleasant experience.

XML IN THE REAL WORLD

Attendance at the conference was much lower than last year. Considering that last year Microsoft announced Office 2003 at the conference while this year there was no such major event, this is no surprise. I suspect another reason is that XML is no longer new and is now so low down in the stack that a conference dedicated to just XML is no longer that interesting. Of course, this is only my second conference so this level of attendance may be typical from previous years and I may have just witnessed an abnormality last year.

Like last year, the conference seemed targetted mostly at the ex-SGML crowd (or document-centric XML users) although this time there wasn't the significant focus on Semantic Web technologies such as topic maps that I saw last year. I did learn a new buzzword around Semantic Web technologies, Semantic Integration and found out that there are companies selling products that claim to do what until this point I'd assumed was mostly theoretical. I tried to ask one such vendor how they deal with some of the issues with non-trivial transformation such as the pubDate vs. dc:date example from a previous post but he glossed over details but implied that besides using ontologies to map between vocabularies they allowed people to inject code where it was needed. This seems to confirm my suspicions that in the real world you end up either using XSLT or reinventing XSLT to perform transformations between XML vocabularies. 

From looking at the conference schedule, it is interesting to note that some XML technologies got a lot less coverage in the conference  relative to how much discussion they cause in the news or blogosphere. For example, I didn't see any sessions on RSS although there is one by Sam Ruby on Atom scheduled for later this morning. Also there didn't seem to be much about XML Web Service technologies being produced by the major vendors such as IBM, BEA or Microsoft. I can't tell if this is because there was no interest in submitting such sessions or whether the folks who picked the sessions didn't find these technologies interesting. Based on the fact that a number of the folks who had "Reviewer" on their conference badge were from the old school SGML crowd I suspect the latter. There definitely seemed to be disconnect between the technologies covered during the conference and how XML is used in the real world in a number of cases.

MEETING XML GEEKS

I've gotten to chat with a number of people I've exchanged mail with but never met including Tim Bray, Jon Udell, Sean McGrath, Norm Walsh and Betty Harvey. I also got to talk to a couple of folks I met last year like Rick Jellife, Sam Ruby, Simon St. Laurent, Mike Champion  and James Clark. Most of the hanging out occurred at the soiree at Tim and Lauren's. As Tim mentions in his blog post there were a couple of "Wow, you're Dare?" or 'Wow, you're Sean Mcgrath?" through out the evening. The coolest part of that evening was that I got to meet Eve Maler who I was all star struck about meeting since I'd been seeing her name crop up as being one of the Über-XML geeks at Sun Microsystems since I was a programming welp back in college and I'm there gushing "Wow, you're Eve Maler" and she was like "Oh you're Dare? I read your articles, they're pretty good". Sweet. Since Eve worked at Sun I intended to give her some light-hearted flack over a presentation entitled UBL and Object-Oriented XML: Making Type-Aware Systems Work which was spreading the notion that the relying on the "object oriented" features of W3C XML Schema was a good idea then it turned out that she agreed with me. Methinks another W3C XML Schema article on XML.com could be spawned from this. Hmmmm.


 

Categories: XML

December 11, 2003
@ 03:42 PM

The new home of the RSS Bandit project is on SourceForge. Various things precipitated this move with the most recent being the fact that a Passport account was needed to download RSS Bandit from GotDotNet. I'd like to thank Andy Oakley for all his help with  GotDotNet Workspaces while RSS Bandit was hosted on there.

The most current release of RSS Bandit is still v1.2.0.61, you can now download it from sourceforge here. The source code is still available, and you can now browse the RSS Bandit CVS repository if interested in such things.


 

Categories: RSS Bandit

Jeremy Zawodney writes

The News RSS Feeds are great if you want to follow a particular category of news. For example, you might want to read the latest Sports (RSS) or Entertainment (RSS) news in your aggregator. But what if you'd like an RSS News feed generated just for you? One based on a word or phrase that you could supply?
...
 For example, if you'd like to follow all the news that mentions Microsoft, you can do that. Just subscribe to this url. And if you want to find news that mentions Microsoft in a financial context, use Microsoft's stock ticker (MSFT) as the search parameter like this.

Compare this to how you'd programmatically do the same thing with Google using the Google Web API which utilizes SOAP & WSDL. Depending on whether you have the right toolkit or not, the Google Web API ease either much simpler or much harder to program against than the Yahoo RSS based search. With the Yahoo RSS based search, a programmer has to directly deal with HTTP and XML when programming against it while with the Google API and the appropriate XML Web Service tools this is all hidden behind the scenes and for the most part the developer programs directly against objects that represent the Google API without dealing directly with XML or HTTP. For example, see this example of talking to the Google API from PHP. Without using appropriate XML Web Service tools, the Google API is more complex to program against than the Yahoo RSS search because one now has to deal with sending and receiving SOAP requests not just regular HTTP GETs. However there are a number of freely available XML Web Service toolsets available so there should be no reason to program against the Google API directly.

This being said there are a number of benefits to the URI-based (i.e RESTful) search that Yahoo provides which comes from being a part of the Web architecture.

  1. I can bookmark a Yahoo RSS search or send a link to it in an email. I can't do the same with an RPC-style SOAP API.
  2. Intermediaries between my machine and Google are unlikely to cache the results of a search made via the Google API since it uses HTTP POST but could cache requests that use the Yahoo RSS-based  search since it uses HTTP GET.  This improves the scalability of the Yahoo RSS-based search without any explicit work from myself or Yahoo, this is just from utilizing the benefits of the Web architecture.

The above contrast of the differing techniques for returning search results as XML used by Yahoo and Google is a good way to compare and contrast RESTful XML Web Services to RPC-based XML Web Services and understand why some people believe strongly [perhaps too strongly] that XML Web Services should be RESTful not RPC-based.

By the way, I noticed that Adam Bosworth is trying to come to grips with REST which should lead to some interesting discussion for those who are interested in the RESTful XML Web Services vs. RPC-based XML Web Services debate.

 

 


 

Categories: XML

In the most recent release of RSS Bandit we started down the path of making it look like Outlook 2003 by using Tim Divil's Winforms controls. The primary change we made was change the toolbar. This change wasn't good enough for Thomas Feudenberg who made a couple of other changes to RSS Bandit that make it look more like Outlook 2003. He wrote

Anyway, contrary to SharpReader, you can get the bandit's source code. Because I really like the UI of Outlook 2003 (and FeedDemon), I patched RSS Bandit:

It took about 15 minutes. Mainly I docked the feed item list and the splitter to the left. Additionally, I've created a custom feed item formatter, which bases on Julien Cheyssial's MyBlogroll template. You can download my XSLT file here.

You can find a screenshot on his website. Torsten's already made similar changes to the official RSS Bandit source code after seeing Thomas's feedback.


 

Categories: RSS Bandit