We have a few open developer and program manager positions on the WebData XML team at Microsoft. Are you are interested in working on implementing XML technologies that will impact not only significant aspects of Microsoft but the software industry at large? Would like to get a chance to collaborate with teams as diverse as Office, Windows (Avalon, Indigo & WinFS), BizTalk, SQL Server and Visual Studio on building the next generation of XML technologies for the Microsoft platform? Do you get passionate about XML or related technologies? If so take a gander at the following open job descriptions on our team and if you believe you qualify send mail to xmljobATmicrosoft.com

See you soon.


 

Categories: Life in the B0rg Cube

It seems every few months there are a series of blog posts or articles about why returning ADO.NET DataSet objects from XML Web Services.  I saw the most recent incarnation of this perma-debate in Scott Hansellman's post Returning DataSets from WebServices is the Spawn of Satan and Represents All That Is Truly Evil in the World and Ted Neward's More on why DataSets are the Root of all Evil.

I was going to type up a response to both posts until I saw Doug Purdy's amusing response, PurchaseOrders are the root of all evil, which succintly points out the flaws in Scott and Ted's arguments.

Now I'm off to bed.


 

Categories: Mindless Link Propagation | XML

June 3, 2004
@ 07:19 AM

I've been thinking a bit about false goals and software projects. Often decisions are made about the design of a technology or product early in the life of a software project that are based on certain assumptions about the software landscape. However in many cases these design principles lose relevancy as the project goes on but rarely are the original design principles of the project questioned. This leads to members of the project chasing goals that actually aren't beneficial to the product or to its customers and which in fact may be detrimental, these are false goals.  

Always remember to question everything.


 

Categories: Ramblings

I just read Tim Bray's entry entitled SOA Talk where he mentions listening to Steve Gillmor, Doc Searls, Jon Udell, Dana Gardner, and Dan Farber talk about SOA via “The Gillmor Gang” at ITConversations. I tried to listen to the radio show a few days ago but had the same problems Tim had. A transcript would definitely be appreciated.

What I found interesting is this excerpt from Tim Bray's blog post

Apparently a recent large-scale survey of professionals revealed that “SOA” has positive buzz and high perceived relevance, while “Web Services” scores very low. Huh?

This is very unsurprising to me. Regular readers of my blog may remember I wrote about the rise of the Service Oriented Architecture fad a few months ago. Based on various conversations with different people involved with XML Web Services and SOA I tend to think my initial observations in that post were accurate. Specifically I wrote

The way I see it the phrase "XML Web Services" already had the baggage of WSDL, SOAP, UDDI, et al so there a new buzzphrase was needed that highlighted the useful aspects of "XML Web Services" but didn't tie people to one implementation of these ideas but also adopted the stance that approaches such as CORBA or REST make sense as well.

Of the three words in the phrase "XML Web Services" the first two are implementation specific and not in a good way. XML is good thing primarily because it is supported by lots of platforms and lots of vendors not because of any inherrent suitability of the technology for a number of the tasks people utilize it for. However in situations where this interop is not really necessary then XML is not really a good idea. In the past, various distributed computing afficionados have tried to get around this by talking up the The InfoSet which was just a nice way of deprecating the notion of usage of the XML text format everywhere being a good thing. The second word in the phrase is similarly inapllicable in the general case. Most of the people interested in XML Web Services are interested in distributed computing which traditionally and currently is more about the intranet than it is about the internet. The need to justify the Web-like nature of XML Web Services when in truth these technologies probably aren't going to be embraced on the Web in a big way seems to have been a sore point of many discussions in distributed computing circles.

Another reason I see for XML Web Services having negative buzz versus SOA is that when many people think of XML Web Services, they think of overhyped technologies that never delivered such as Microsoft's Hailstorm.  On the other hand, SOA is about applying the experiences of 2 decades of building distributed applications to building such applications today and in the future. Of course, there are folks at Microsoft who are wary of being burned by the hype bandwagon and there've already been some moves by some of the thought leadership to distance what Microsoft is doing from the SOA hype. One example of this is the observation that lots of the Indigo folks now talk about 'Service Orientation' instead of 'Service Oriented Architecture'.

Disclaimer: The above comments do not represent the thoughts, intentions, plans or strategies of my employer. They are solely my opinion.


 

Categories: Technology | XML

Bryan Kam has reviewed a couple of free RSS aggregators for Windows. Below are excerpts of his reviews including his final choice  

I began with the aptly-named and small FeedReader 2.5. While it has all the basic features covered, it lacks a lot of things I like...Score: 3/10. Fast but featureless.

Next I tried Sharpreader 0.9.4.1. This is a pretty good one, which features different sorting, various update times, alerts, inherited properties, can import/export OPML...It would take 40+ MB RAM on my desktop computer, and sometimes would take 100% cycles for no reason...Score: 5/10. Full of features, but slow as hell!

Another one I tried a while back was Syndirella 0.9b. While I was not a big fan of the Windows 3.1-esque interface, it does have a rudimentary scraper... This is great for sites that don't offer feeds. Other than that, though, this reader is pretty lacking, not even having categories which are a necessity in my opinion. Score: 5/10. Nice scraper, the rest kinda sucks.

Currently I'm using Abilon 2.0, which has many of the features I like...The interface is divided into three vertical columns: the far left is the list of feeds, the middle is the items in the selected feed, the right is the detail for the selected item. I find this very weird. Score: 7/10. It's got the goods, it's small, but it's not fun to use.

Okay, another brief RSS reader review. This one is called RSS Bandit and I've discarded Abilon in favor of it...Feature-wise it's pretty standard. The little slide-up alerts, which many of these readers have, is actually reliably click-able in this program...Another good feature is its "Locate RSS" feeds which attempts to find a feed for whatever websites or keywords you enter.8/10. Decent, but lacks that extra something.

It's good to read first hand accounts of what people like or dislike about RSS Bandit especially when compared to other RSS aggregators. I tend to agree with Bryan that RSS Bandit currently leads the pack amongst the major free RSS aggregators for Windows. The next release will aim at being competitive with commercial aggregators such as FeedDemon and NewzCrawler.

This should be a fun summer.  


 

Mark Pilgrim has a blog post entitled how to make a linkblog in Atom which shows one technique for syndicating a list of links in an Atom feed.  Unfortunately there is one problem with Mark's article, the technique it recommends violates the ATOM 0.3 specification and generates an invalid feed.

There are two problem sections in Mark's article. In the first How to link to an article he writes

But what about the super-fascinating thing we're actually linking to? That goes in its own element.

<link rel="related" type="text/html"
     href="http://home.introweb.nl/~dodger/itunesserver.html"
     title="Setting up an iTunes server in FreeBSD"/>

and in the section entitled How to credit people whose links you republish he writes

Simply put, a "via" link is a link back to where you found the link you're posting. In this example, I discovered the article on setting up a FreeBSD iTunes server via Jeffrey Veen, so let's give him some credit:

<link rel="via" type="text/html" href="http://www.veen.com/jeff/archives/000545.html" title="Jeffrey Veen"/>

The problem with both sections is that Mark uses values for the rel attribute that are not considered valid by the Atom 0.3 specification. In Section 3.4.1 of the Atom specification it states

3.4  Link Constructs

A Link construct is an element that MUST NOT have any child content, and has the following attributes:

3.4.1  "rel" Attribute

The "rel" attribute indicates the type of relationship that the link represents. Link constructs MUST have a rel attribute, whose value MUST be a string, and MUST be one of the values enumerated in the Atom API specification http://bitworking.org/projects/atom/draft-gregorio-09.html.

On navigating to the provided URL and reading Section 5.4.1 of the Atom specification which defines the valid values of the rel attribute of the link element there is the following list

5.4.1  rel

This attribute describes the relationship from the current document, be it HTML or Atom, to the anchor specified by the href attribute. The value of this attribute is a space-separated list of link types. Note that these values are case insensitive. With type="application/x.atom+xml" we have the following interpretations of the relations.

alternate
The URI in the href attribute points to an alternate representation of the containing resource.
start
The Atom feed at the URI supplied in the href attribute contains the first feed in a linear sequence of entries.
next
The Atom feed at the URI supplied in the href attribute contains the next N entries in a linear sequence of entries.
prev
The Atom feed at the URI supplied in the href attribute contains the previous N entries in a linear sequence of entries.
service.edit
The URI given in the href attribute is used to edit a representation of the referred resource.
service.post
The URI in the href attribute is used to create new resources.
service.feed
The URI given in the href attribute is a starting point for navigating content and services.

As can be seen neither related nor via which are used in Mark's article are in the above list. I had expected the Feed Validator written by Mark Pilgrim and Sam Ruby to flag this error but currently when one validates Mark's b-links feed it validates as Valid Atom. I have filed a filed bug# 963354 in the Feed Validator's Bug Database about this issue. Hopefully this error will be resolved soon.

On a final note, it is bad enough that we are going to have to deal with two versions of Atom in the wild (Atom 0.3 and whatever comes out of the standards process) it would be unfortunate to further fragment this by deploying intermediate versions of the format based on mailing list discussions. One of the benefits of Atom is supposed to be that it will usher in an era of rigorously defined specifications in the syndication space, that won't be worth much if people ignore the specifications and go their own way.


 

Yesterday I went to the Apple store in the Bellevue mall to replace the headphones on my iPod which had begun to fray. When I walked up to the counter and told the girl there what I wanted she ushered me to a customer service desk claiming that if my iPod was under warranty I could get the headphones replaced for free. I was highly skeptical of this since I didn't buy the iPod at the Apple Store but at Best Buy and didn't even have my receipt anyway.

Waiting at the customer service desk I got to soak in some of the ambiance of Apple Store. It is definitely a cool place, I liked the flat screen TV over the customer service desk with quotes from luminaries across history such as

  • Plato is my friend, Aristotle is my friend but my best friend is truth - Sir Isaac Newton
  • We must be the change we wish to see in the world - Mahatma Ghandi

When it was finally my turn, my name was displayed on the flat screen TV above the customer support desk and I walked up to be served. I told the guy behind the desk that I needed some new headphones and the girl behind the counter had directed me to him to see if I could get them replaced by the warranty. I explained that I thought this would be unlikely given that I bought the iPod at Best Buy not the Apple Store and didn't have a receipt. To which he replied “It's an Apple product right? I'll just check the serial number”. To my surprise he did just that and I walked out of there with brand new head phones. To cap the experience he also fixed some weird issues I'd been having with my iPod by pointing me to the recent iPod firmware update.

That's what I call fantastic customer service. I felt so good about Apple afterwards I felt like going back to the store and buying some Apple stuff but there's nothing I need right now.  


 

Categories: Ramblings

May 28, 2004
@ 06:52 PM

C.J. Date, one of the most influential names in the relational database world, had some harsh words about XML's encroachment into the world of relational databases in a recent article entitled Date defends relational model  that appeared on SearchDatabases.com. Key parts of the article are excerpted below

Date reserved his harshest criticism for the competition, namely object-oriented and XML-based DBMSs. Calling them "the latest fashions in the computer world," Date said he rejects the argument that relational DBMSs are yesterday's news. Fans of object-oriented database systems "see flaws in the relational model because they don't fully understand it," he said.

Date also said that XML enthusiasts have gone overboard.

"XML was invented to solve the problem of data interchange, but having solved that, they now want to take over the world," he said. "With XML, it's like we forget what we are supposed to be doing, and focus instead on how to do it."

Craig S. Mullins, the director of technology planning at BMC Software and a SearchDatabase.com expert, shares Date's opinion of XML. It can be worthwhile, Mullins said, as long as XML is only used as a method of taking data and putting it into a DBMS. But Mullins cautioned that XML data that is stored in relational DBMSs as whole documents will be useless if the data needs to be queried, and he stressed Date's point that XML is not a real data model.

Craig Mullins points are more straightforward to answer since his comments don't jibe with the current state of the art in the XML world. He states that you can't query XML documents stored in databases but this is untrue. Almost three years ago, I was writing articles about querying XML documents stored in relational databases. Storing XML in a relational database doesn't mean it has to be stored in as an opaque binary BLOB or as a big, bunch of text which cannot effectively be queried. The next version of SQL Server will have extensive capabilities for querying XML data in relational database and doing joins across relational and XML data, a lot of this functionality is described in the article on XML Support in SQL Server 2005. As for XML not having a data model, I beg to differ. There is a data model for XML that many applications and people adhere to, often without realizing that they are doing so. This data model is the XPath 1.0 data model, which is being updated to handled typed data as the XQuery and XPath 2.0 data model.

Now to tackle the meat of C.J. Date's criticisms which is that XML solves the problem of data interchange but now is showing up in the database. The thing first point I'd like point out is that there are two broad usage patterns of XML, it  is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The latter type of data will only grow now that office productivity software like Microsoft Office have enabled users to save their documents as XML instead of proprietary binary formats. In many cases, these documents cannot simply shredded into relational tables. Sure you can shred an Excel spreadsheet written in spreadsheetML into relational tables but is the same really feasible for a Word document written in WordprocessingML? Many enterprises would rather have their important business data being stored and queried from a unified location instead of the current situation where some data is in document management systems, some hangs around as random files in people's folders while some sits in a database management system.

As for stating that critics of the relational model don't understand it, I disagree. One of the major benefits of using XML in relational databases is that it is a lot easier to deal with fluid schemas or data with sparse entries with XML. When the shape of the data tends to change or is not fixed the relational model is simply not designed to deal with this. Constantly changing your database schema is simply not feasible and there is no easy way to provide the extensibility of XML where one can say "after the X element, any element from any namespace can appear". How would one describe the capacity to store “any data” in a traditional relational database without resorting to an opaque blob?

I do tend to agree that some people are going overboard and trying to model their data hierarchically instead of relationally which experience has thought us is a bad idea. Recently on the XML-DEV mailing list entitled Designing XML to Support Information Evolution where Roger L. Costello described his travails trying to model his data which was being transferred as XML in a hierarchical manner. Micheal Champion accurately described the process Roger Costello went through as having "rediscovered the relational model". In a response to that thread I wrote "Hierarchical databases failed for a reason".

Using hierarchy as a primary way to model data is bad for at least the following reasons

  1. Hierarchies tend to encourage redundancy. Imagine I have a <Customer> element who has one or more <ShippingAddress> elements as children as well as one or more <Order> elements as children as well. Each order was shipped to an address, so if modelled hierarchically each <Order> element also will have a <ShippingAddress> element which leads to a lot of unnecessary duplication of data.
  2. In the real world, there are often multiple groups to which a piece of data belongs which often cannot be modelled with a single hierarchy.  
  3. Data is too tightly coupled. If I delete a <Customer> element, this means I've automatically deleted his entire order history since all the <Order> elements are children of <Customer>. Similarly if I query for a <Customer>, I end up getting all the <Order> information as well.

To put it simply, experience has taught the software world that the relational model is a better way to model data than the hierarchical model. Unfortunately, in the rush to embrace XML many a repreating the mistakes from decades ago in the new millenium.


 

Categories: XML

XML.com recently ran an article entitled Document-Centric .NET, that highlights the various technologies for working with XML that exist in the .NET Framework. The article provides a good high level overview of the various options you have for processing XML in the .NET Framework. The article includes an all important caveat which I wish more people knew about and which I keep wanting to write an article about but never get around to doing. The author writes 

However, keep in mind that there are W3C XML Schema features that are not directly compatible with .NET's XML-to-database and XML-to-object mapping tools.

This is very true. Besides our schema validation technologies, most Microsoft technologies or products that utilize W3C XML Schema support a subset of the language due to impedance mismatches between the language and the underlying data model or type system of the target environment.

In fact the only complaint I have about the article is a nitpick about its title. In XML circles, document-centric implies a usage of XML that isn't borne out by his article. If you are interested in the difference between data-centric XML and document-centric XML you should read my article Can One Size Fit All? in XML Journal. In that article I talk about the differences between XML that is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The former is data-centric XML while the latter is document-centric.

 


 

Categories: Mindless Link Propagation | XML

I recently stumbled on an entry by Lucas Gonze where he complains about the RSS <enclosure> element. He writes

Problems with the enclosure element:

  • It causes users to download big files that they will never listen to or watch, creating pointless overload on web hosts.
  • It doesn't allow us to credit the MP3 host, so we can't satisfy the netiquette of always linking back.
  • For broadband users, MP3s are not big enough to need advance caching in the first place.
  • The required content-type attribute is a bad idea in the first place. Mime settings are already prone to breakage, adding an intermediary will just create another source of bugs. There are no usecases for this attribute that can't be more easily and robustly satisfied by having clients HEAD the URL for themselves.
  • The required content-length attribute should not be there. It requires people who link to MP3s to HEAD them and calculate the length, which is sometimes not practical. It makes variable-length MP3s illegal. There are no usecases for this attribute that can't be more easily and robustly satisfied by having clients HEAD the URL for themselves.

The primary problem with the <enclosure> element is that it is overspecified. Having an element that says, here is a pointer to some data that is related to this entry that is too large to fit in the feed is a good idea. Similarly providing a hint at what the MIME type is so the reader knows whether it can handle that MIME type or can display something specific to that media type in the user interface without making an additional request to the server is very useful. The description of the enclosure element in RSS 2.0 states

<enclosure> sub-element of <item> 

<enclosure> is an optional sub-element of <item>.

It has three required attributes. url says where the enclosure is located, length says how big it is in bytes, and type says what its type is, a standard MIME type.

The url must be an http url.

<enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" length="12216320" type="audio/mpeg" />

Syndication geeks might notice that this is akin to the <link> element in the ATOM 0.3 syndication format which is described as

3.4  Link Constructs

A Link construct is an element that MUST NOT have any child content, and has the following attributes:

3.4.1  "rel" Attribute

The "rel" attribute indicates the type of relationship that the link represents. Link constructs MUST have a rel attribute, whose value MUST be a string, and MUST be one of the values enumerated in the Atom API specification <eref>http://bitworking.org/projects/atom/draft-gregorio-09.html</eref>.

3.4.2  "type" Attribute

The "type" attribute indicates an advisory media type; it MAY be used as a hint to determine the type of the representation which should be returned when the URI in the href attribute is dereferenced. Note that the type attribute does not override the actual media type returned with the representation.

Link constructs MUST have a type attribute, whose value MUST be a registered media type [RFC2045].

3.4.3  "href" Attribute

The "href" attribute contains the link's URI. Link constructs MUST have a href attribute, whose value MUST be a URI [RFC2396].

xml:base [W3C.REC-xmlbase-20010627] processing MUST be applied to the atom:url element.

3.4.4  "title" Attribute

The "title" attribute conveys human-readable information about the link. Link constructs MAY have a title attribute, whose value MUST be a string.

So the ideas behind the <enclosure> element were good enough that they appear in ATOM with some additional niceties and a troublesome bit (the length attribute) removed. So if the concepts behid the <enclosure> element are so good that they are first class members of the ATOM syndication format. Why does Lucas not like it? The big problem with RSS enclosures is how Dave Winer expected them to be used. An aggregator was supposed to act like a TiVo, automatically downloading files in the background and presenting them to you when it's done. The glaring problem with doing this is that it means lots of people are automatically downloading large files that they didn't request which is a significant waste of bandwidth. In fact, most aggregators either do not support enclosures or simply show them as links which is what FeedDemon and RSS Bandit (with the Outlook 2K3 skin) do. The funny thing is that the actual RSS specification doesn't describe this behavior, instead this behavior is implied by Dave Winer's descriptions of use cases.

Lucas also complains about the required length attribute which is problematic if you are pointing to a file on a server you don't own because you have to first download the file or perform a HTTP HEAD to get its size. The average blogger isn't going to go through that kind of trouble. Although tools could help it makes sense for the  length attribute  to have been an optional hint.

I have to disagree with Lucas's complaints about putting the MIME type in the <enclosure> element. He complains that the MIME type in the <enclosure> could be wrong and in fact that in many cases web servers  serve a file with the wrong MIME type. Thus he concludes that putting the MIME type in the enclosure is wrong. Client software should be able, to decide how to react to the enclosure [e.g. if it is audio/mpeg display a play button] without having to make additional HTTP requests especially since as Lucas points out it is not a 100% guaranteed that performing an HTTP HEAD of the linked file will actually get you the correct MIME type from the web server.

In conclusion, I agree that the <enclosure> element is problematic but most of the problems are due to the implied use case suggested by the spec author, Dave Winer, as opposed to the actual information provided by the element. The ATOM approach of describing the information provided by each element in a feed but not explicitly describing the expected behavior of clients is a welcome approach. Of course, there will always be developers who require structure or take an absence of explicit guidelines to mean do stupid things (like aggregators that fetch your feed every 5 minutes)  but these are probably better handled in "Best Practices" style documents or test suites than in the actual specification.


 

Categories: XML