One of the biggest concerns about RSS is the amount of bandwidth consumed by wasteful requests. Recently on an internal mailing list discussion there was a complaint about the amount of bandwidth wasted because weblog servers send a news aggregator an RSS feed containing items it has already seen. A typical news feed contains 10 - 15 news items where the oldest is a few weeks old and the newest is a few days old. A typical user has used their news aggregator to fetch the an RSS feed about once every other day. This means on average at least half the items in an RSS feed are redundant to people who are subscribed to the feed yet everyone (client & server) incurs bandwidth costs by having the redundant items appear in the feeds.

So how can this be solved? All the pieces to solve this puzzle are already on the table. Every news aggregator worth it's salt (NetNewsWire, SharpReader, NewsGator, RSS Bandit, FeedDemon, etc) uses HTTP Conditional GET requests. What does that mean in English? It means that most aggregators send information about when last they retrieved the RSS feed via the If-Modified-Since HTTP header and a the hashcode of the RSS feed provided by the server the last time it was fetched via the If-None-Match HTTP header. The interesting point is that although most news aggregators tell the server the last time they fetched the RSS feed almost no weblog server I am aware of actually uses this information to tailor the information sent back in the RSS feed. The weblog software I use is guilty of this as well.

If you fetched my RSS feed yesterday or the day before there is no reason for my weblog server to send you a 200K file containing five entries from last week which it currently does. Actually it is worse, currently my weblog software doesn't even perform the simple check of seeing whether there are any new items before choosing to send down a 200K file.

Currently this optimization is the one performed by weblog servers, if there are no new items then a HTTP 304 response is sent otherwise a feed containing the last n items is sent. A further optimization is possible where the server only sends down the last n items newer than the If-Modified-Since date sent by the client.

I'll ensure that this change makes it into the next release of dasBlog (the weblog software I use) and if you use weblog software I suggest requesting that your software vendor to do the same.

UPDATE: There is a problem with the above proposal in that it calls for a reinterpretation of how If-Modified-Since is currently used by most HTTP clients and directly violates the HTTP spec which states

b) If the variant has been modified since the If-Modified-Since
         date, the response is exactly the same as for a normal GET.

The proposal is still valid except that this time instead of misusing the If-Modified-Since header I'd propose that clients and servers respect a new custom HTTP header such as "X-Feed-Items-New-Than"  whose value would be a date in the same format as that used by the If-Modified-Since header.


 

Categories: XML

November 5, 2003
@ 02:51 AM

The following screenshot reminded me of recent comments by Robert Scoble on blogs and conversational marketing


 

Categories: Ramblings

In response to my earlier post on his "Replace & Defend" theory Jon Udell writes

We have yet to even scratch the surface of what's possible given these circumstances. And now here comes WinFS with its own proprietary schema language. In recent years, it's been popular to layer innovation on top of base standards. So XSLT, XQuery, and SQL200n all rely on XPath, as WSDL relies on XSD. Yet no base standards beyond XML itself were of use to WinFS? It puzzles me. The things defined in WinFS don't seem exotic or mysterious. "A WinFS Contact type," the docs say, "has the Item super type. Person, Group, and Organization are some of its subtypes." If XSD can't model such things, we're in real trouble.

I doubt that anyone is claiming that W3C XML Schema cannot model containment or type derivations, however the way it does implements type derivation leaves much to be desired. In fact, this is the topic of an article I wrote that showed up on XML.com last week entitled XML Schema Design Patterns: Is Complex Type Derivation Unnecessary?. This is just another example of how things that seem straightforward end up being fairly complicated in W3C XML Schema. As Don Box puts it XML Schema has already eclipsed C++ in terms of complexity. Given that the WinFS schema language isn't even about modelling XML documents it seems perplexing that one would expect it to take on the complexity of using W3C XML Schema as its modelling language.

Of course WinFS does much more than model datatypes and structures. It's a highly sophisticated storage system that supports relational, object, and XML access styles, and that treats relationships among items as first-class objects in themselves (a potent feature I first encountered in the object database world years ago.) Great stuff! But the terminology of the Longhorn docs is revealing. Person, Contact, and Organization items are referred to as "Windows types," presumably because their schemata appear as classes in Longhorn's managed API. But to me these are universal types, not Windows types. I had expected them to be defined using XML Schema, and to be able to interoperate directly with SOAP payloads and XML documents on any platform.

Being defined using W3C XML Schema and being able to interoperate directly with XML documents on any platform are orthogonal. Information in relational databases like SQL Server is described using relational schema languages (i.e. SQL) however this hasn't stopped Microsoft from creating myriad ways to extract XML from SQL Server such as SQLXML, FOR XML queries and  the .NET Framework's DataSet class which information stored in relational databases to interoperate directly with SOAP payloads and XML documents. No one would claim that the fact that the data in a relational database is not defined using W3C XML Schema (or RELAX NG) makes it impossible to extract XML from a relational database or view it as an XML data source. WinFS is no different.

It's troubling, though, that the architects must be consulted to find out whether Longhorn's "Windows types" will be transferable to standards-based software.

I don't work on WinFS so there was no chance I'd make a definitive statement about what features they plan to support or not. This is simply common sense on my part  not an indication one way or the other about the degree of XML support in WinFS. With any luck I'll soon be able to get one of the WinFS folks to start blogging and then more accurate information can be gotten from the horses mouth.


 

Categories: Life in the B0rg Cube

I was planning to write this month's Extreme XML column on the recently released EXSLT.NET implementation produced by myself and a couple of others. One of the cool things about the EXSLT.NET project is that we added the ability to use the EXSLT extension functions in XPath queries over any data source that provides an XPathNavigator (i.e. implements IXPathNavigable). Thus one would be able to use functions like set:distinct and regexp:match when running XPath queries over objects that implement the IXPathNavigable interface such as the XPathDocument, XmlDocument or XmlDataDocument.  

In constructing my examples I decided that it would be even cooler to show the extensibility of the .NET Framework if I showed how one could use the XPath extension functions in queries over implementations of XPathNavigator not provided by the .NET Framework such as my perennial favorite, the ObjectXPathNavigator.

After fixing some bugs in the ObjectXPathNavigator implementation on MSDN (MoveToParent() didn't take you to the root node from the document element and the navigator only exposed public properties but not public fields) I came across a problem which will probably turn into yet another project on GotDotNet workspaces. The heuristics the ObjectXPathNavigator uses to provide an XML view of an arbitrary object graph doesn't take into account the class annotations used by XML Serialization in the .NET Framework. Basically this means that if one reads in an XML document, converts it to objects using the XmlSerializer then creates an ObjectXPathNavigator over the objects...the XML view of the object provided by the ObjectXPathNavigator would not be the same as the XML generated when the class is serialized as XML via the XmlSerializer.

In fact for the ObjectXPathNavigator to provide the same XML view of an objects as the XmlSerializer would involve having it understand the various attributes for annotating classes from the System.Xml.Serialization namespace. Considering that in the future, the XPathNavigator should be the primary API for accessing XML in the .NET Framework it would be extremely quite useful if there was an API that allowed any object to be treated as a first class citizen of the XML world. The first step was the XmlSerializer which allowed any class to be saved and loaded to and from XML streams, the next step should be enabling any object to be accessed in the same way XML documents are as well. Instant benefits are things like the ability to perform XPath and XSLT over arbitrary objects. In the Whidbey/Yukon (Visual Studio v.next/SQL Server v.next) timeframe this means getting stuff like XQuery over objects  or the ability to convert any object graph to an XmlReader for free.

It looks like I have a winter project, but first I have to finish this month's column on EXSLT.NET. *sigh*


 

Categories: XML

November 3, 2003
@ 04:04 AM

I just spent two hours trying to figure out why a chunk of code was throwing exceptions when compiled and run from my command line application but similar code worked fine when running in Visual Studio. The problem turned out to be a bug in the .NET Framework's XslTransform class which existed in v1.0 of the .NET Framework but not v1.1. Since I was using Visual Studio.NET 2003 [which uses v1.1 of the .NET Framework] to run my test app but compiling my actual application with the compiler for v1.0 of the .NET Framework I wasn't actually doing an apples-for-apples comparison when running both apps.

I'm tempted to uninstall v1.0 of the .NET Framework from my machine so I don't end up with facing this problem again. What a waste of time.

 


 

Categories: Ramblings

Jon Udell writes

Reading the Longhorn SDK docs is a disorienting experience. Everything's familiar but different.

Example 3: The new XSD:

"WinFS" Schema Definition Language "WinFS" introduces a schema definition language to describe "WinFS" types. This language is an XML vocabulary. "WinFS" includes a set of schemas that define a set of Item types and NestedElement types. These are called Windows types.

Yeah, "embrace and extend" was so much fun, I can hardly wait for "replace and defend." Seriously, if the suite of standards now targeted for elimination from Microsoft's actively-developed portfolio were a technological dead end, ripe for disruption, then we should all thank Microsoft for pulling the trigger. If, on the other hand, these standards are fundamentally sound, then it's a time for what Clayton Christensen calls sustaining rather than disruptive advances. I believe the ecosystem needs sustaining more than disruption. Like Joe, I hope Microsoft's bold move will mobilize the sustainers.

I can't speak for the other technologies Jon but being the Program manager responsible for XML schema technologies in the .NET Framework [as well as the fact that the offices of a number of WinFS folks are a few doors away from mine] I can speak on the above example.

The first thing I'd like to note is that Jon's example is that the W3C XML Schema definition language (XSD) is far from being targetted for elimination from Microsoft's actively-developed portfolio? We have almost a dozen technologies and products that utilize XSD in some way shape or form; SQL Server [Yukon], Visual Studio.NET, Indigo, Word, Excel, InfoPath, SQLXML, the .NET framework's DataSet, BizTalk, FoxPro, the .NET Framework's XmlSerializer and a couple of others. The fact that one technology or product decides that it makes more sense to create an XML vocabulary that meets their specific needs instead of shoehorning an inappropriate technology into their use case and thus causing pain for themselves and their customers is a wise decision that should be lauded (In fact,this is the entire point XML was invented in the first place, see my "SGML on the Web" post) instead of being heralded as another evil conspiracy by Microsoft.

Now to go into specifics. The WinFS schema language and W3C XML Schema do two fundamentally different things; W3C XML Schema is a way for describing the structure and contents of an XML document while a WinFS schema describes types and relationships of items stored in WinFS. At first glance, the only thing that connects both schema languages is that they are both written in XML. Of course, this is just syntax so this similarity isn't any more significant than the fact that both SQL and Java use English keywords.

Where it gets interesting is if one asks whether the WinFS data model can be mapped to XML and then XSD used to describe the structure of this XML view of WinFS. This is possible but leads to impedance mismatches due to the differences between the WinFS model and that used by W3C XML Schema. Specifically there are constructs in W3C XML Schema that don't map to concepts in WinFS and concepts in WinFS that don't map to constructs in W3C XML Schema. So for WinFS to use W3C XML Schema as the syntax for its schema language it would have to do two things

  1. Support a subset of W3C XML Schema
  2. Extend W3C XML Schema to add support for WinFS concepts

The problem with this approach is that it leads to complaints of a different kind. The first being that there is user confusion because not every valid W3C XML Schema construct is usable in this context and the other being that W3C XML Schema will end up be extended in a proprietary manner which eventually leads to yells of "embrace and extend". By the way, this isn't guesswork on my part this is a description of what happened when Microsoft took this approach with the .NET Framework's DataSet class and SQLXML. History has taught us that these approaches were unwise and used W3C XML Schema outside of its original goals and usage scenarios. Instead, we have moved back to the original goals of XML where instead of relying on one uber-vocabulary for all ones needs, vocabularies specific to various situations are built but can be translated or transformed as needed when moving from one domain to another.

Ideally, even though WinFS has its own schema language it makes sense that it should be able to import or export WinFS items as XML described using an W3C XML Schema since this is the most popular way to transfer structured and semi-structured data in our highly connected world. This is functionality is something I've brought up with the WinFS architects which they have stated will be investigated.


 

Categories: Life in the B0rg Cube

Drew Marsh blogged about the talk given by my boss at this year's Microsoft Professional Developer's Conference (PDC) entitled What's New In System.Xml For Whidbey?". Since I'm directly responsible for some of the stuff mentioned in the talk I though it would make sense if I made some clarifications or added details where some where lacking from his coverage.

Usability Improvements (Beta 1)

  • CLR type accesors on XmlReader, XmlWriter and XPathNavigator: Double unitPrice = reader.ValueAsDouble

 

This was a big gripe from folks in v1.0 of the .NET Framework that they couldn't access the XML in a validated document as a typed value, this is no longer the case in Whidbey. However people who want this functionality will have to move to the XPathDocument instead of the XmlDocument. People will be able to get typed values from an XmlDocument (actually from anything that implements IXPathNavigable) but actually storing the data in the in-memory representation as a typed value will only be available on the XPathDocument.  

 

XPathDocument A Better XML DOM

"XmlDocument is dead."

 

  • XPathDocument replaces the XmlDocument as the primary XML store.
  • Feature Set
    • 20%-40% more performant for XSLT and Xquery
    • Editing capabilities through the XPathEditor (derives from XPathDocument) using an XmlWriter (the mythical XmlNodeWriter we've all been searching for).
    • XML schema validation
    • Strongly typed store. Integers stored as int internally (per schema) (Beta 1)
    • Change tracking at node level
    • UI databinding support to WinForms and ASP.NET controls (Beta 1)

Yup, in v1.0 of the .NET Framework we moved away from a push-based parser (SAX) in MSXML to a pull-based parser (XmlReader) in the .NET Framework. In v2.0 of the .NET Framework there's been a similar shift, from the DOM data model & tree based APIs for accessing XML to the XPath data model & cursor based APIs for accessing XML. If you are curious about some of the thinking that went into this decision you should take a look at my article in XML Journal entitled Can One Size Fit All? 

 

Note: XPathDocument2 in PDC bits will be XPathDocument once again by Beta 1. "We were at an unfortunate design stage at the point where the PDC bits were created."

Yeah, things were in flux for a while during our development process. The features of the class called XPathDocument2 in the PDC builds will be integrated back into the XPathDocument class that was in v1.0 of the .NET Framework.

 

The rest of the stuff in the talk (XQuery, new XML editor in Visual Studio.NET, ADO.NET with SQLXML, etc) isn't stuff I'm directly responsible for so I hesitate to comment further, however Drew has taken excellent notes about them so it is clear which direction we're going in for Whidbey.

 

 

 


 

Categories: Life in the B0rg Cube | XML

The third in my semi-regular series of guidelines for working with W3C XML Schema for XML.com is now up. The article is entitled XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? and the article is excerpted below for those who may not have the time to read the entire article

INTRODUCTION

W3C XML Schema (WXS) possesses a number of features that mimic object oriented concepts, including type derivation and polymorphism. However real world experience has shown that these features tend to complicate schemas, may have subtle interactions that lead tricky problems, and can often be replaced by other features of WXS. In this article I explore both derivation by restriction and derivation by extension of complex types showing the pros and cons of both techniques, as well as showing alternatives to achieving the same results

MIDDLE

As usage of XML and XML schema languages has become more widespread, two primary usage scenarios have developed around XML document validation and XML schemas.

  1. Describing and enforcing the contract between producers and consumers of XML documents: ...
  2. Creating the basis for processing and storing typed data represented as XML documents: ...

CONCLUSION

Based on the current technological landscape the complex type derivation features of WXS may add more problems than they solve in the two most commmon schema use cases. For validation scenarios, derivation by restriction is of marginal value, while derivation by extension is a good way to create modularity as well as encourage reuse. Care must however be taken to consider the ramifications of the various type substitutability features of WXS (xsi:type and substitution groups) when using derivation by extension in scenarios revolving around document validation.

Currently processing and storage of strongly typed XML data is primarily the province of conventional OOP languages and relational databases respectively. This means that certain features of WXS such as derivation by restriction (and to a lesser extent derivation by extension) cause an impedance mismatch between the type system used to describe strongly typed XML and the mechanisms used for processing and storing said XML. Eventually when technologies like XQuery become widespread for processing typed XML and support for XML and W3C XML Schema is integrated into mainstream database products this impedance mismatch will not be important. Until then complex type derivation should be carefully evaluated before being used in situations where W3C XML Schema is primarily being used as a mechanism to create type annotated XML infosets.


 

Categories: XML

October 29, 2003
@ 01:57 PM

A recent article by Phil Howard of Bloor Research on IT-Director.com talks about the Demise of the XML Database. Excerpts below

While you can still buy an XML database purely because it provides faster storage capability and greater functionality than a conventional database, all the erstwhile XML database vendors are increasingly turning to other sources of use for their products.

These other markets basically consist of two different sectors: the use of XML databases as a part of an integration strategy, where the database is used to provide on-the-fly translation for XML documents, and for content management...

The reason why there is this trend away from pure XML storage is because advanced XML capabilities are being introduced by all the leading relational vendors. 

This has been considered "fighting words" from some in the XML database camp such as Mike Champion (works on Tamino XML database) and Kimbro Staken (one of the originators of Apache Xindice). Mike Champion comes up with a number of counter-arguments to the claims in the article I found interesting and felt compelled to comment on. According to Mike

  • It is widely believed that less than a quarter of enterprise data is currently stored in RDBMS systems. This suggests that the market is not "making do" with what the relational database products offer today, but using a wide variety of technologies.

This is actually the mantra of the team I work for at Microsoft. We are responsible for data access technologies (Relational, Object and XML) and our GM is fond of trotting out the quote about "less than a quarter of enterprise data is currently stored in a relational database". A lot of data important to businesses is just siting around on file systems in various Microsoft Office documents and other file formats. The bet across the software industry is that moving all this semi-structured business documents to XML is the right way to go and the first step has been achieved given that modern business productivity software (including the Open Source ones) are moving to fully supporting XML for their document formats. Step one is definitely to get all those memos, contracts and spreadsheets into XML.

  • The main reason OODBMS didn't hit the sweet spot, AFAIK, is that they created a tight coupling between application code and the DBMS. Potential performance gains this allows can outweigh the maintenance challenges in extremely business critical, high transaction volume environments...XML DBMS, on the other hand, inherit XML's suitability for loosely coupling systems, applications, and tools across a wide range of environments.

Totally agree here about the weakness of OODBMSs in creating a tight coupling between applications and the data they accessed. For a more in-depth description of the disadvantages of object oriented databases in comparison to their relational counterparts you can read my article An Exploration of Object Oriented Database Management Systems.

  • Again AFAIK (having only played with OODBMS personally), there is relatively little portability across OODBMS systems; code written for one would be very expensive to adapt to another. Investing in the technology required one to make a risky bet on the vendor who supplied it. This created an environment where the object-relational vendors could prosper by offering only a subset of the features but the absolute assurance that they would be in business for years to come. In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards;

There are two points Mike is making here

  1. There is very little portability across OODBMS systems.
  2. In the XML DBMS world, on the other hand, all support roughly the same schema, query language, and API standards

Based on my experiences with OODBMSs the first claim is entirely accurate, moving data from one OODBMS system was a pain and there was a definitle lack of standardization of APIs and query languages across various products. The second claim is rather suspect to me. I am unaware of any schema, query or API standards that are supported uniformly across XML database products. This isn't to say there aren't standardized W3C branded XML schema languages or query languages nor that there haven't been moves to come up with standard XML database APIs but when last I looked these weren't uniformly supported across many the XML database products and where they were there was a distinct lack of maturity in their offerings. Granted it's been almost a year since  I last looked.

However there is an obvious point about portability that Mike doesn't mention (perhaps because it is so obvious). The entire point of XML is being portable and interoperability, moving data from one XML database to another should simply be a case of "export database as XML" from one and "import XML into database" on the other.

  • The standards of the XML world provide a clearly defined and fairly high bar for those who would seek to take away the market pioneered by the XML DBMS vendors. For better or worse, the XML family of specs is complex and quite challenging to support efficiently in a DBMS system. It's one thing to support, as the RDBMS vendors now do quite well, XML views of structured, typed, relatively "flat" data such as are typically found in RDBMS applications. It is quite another to efficiently and scalably support queries and updates on "document-like" XML with relatively open content models, lots of recursion, mixed content, and where wildcard text comparisions are more frequent than typed value comparisons. The dominant DBMS vendors obviously have talent and money to throw at the problem, but analysts should not assume that they will surpass theese capabilities of the XML DBMS systems anytime soon

OK, this one sounds like FUD. Basically Mike seems to be saying the family of XML specs is so complex (thanks to the W3C, but that's another story) that companies like Oracle, IBM and Microsoft won't be able to come up with ways to query semi-structured data efficiently or perform text comparison searches well so you are best of sticking to a seperate database for your XML data instead of having all your data stored in a single unified store.

So what is my position on the death of native XML databases? Like Phil Howard, I suspect that once XML support becomes [further] integrated into mainstream relational databases (which it  already has to some degree) then native XML databases will be hard pressed to come up with reasons why one would want to buy a separate product for storing XML data distinct from the rest of the data for a business when a traditional relational database can store it all. It's all about integration. Businesses prefer buying a single office productivity suite than mixing and matching word processors, spreadsheets and presentation programs from different vendors. I suspect the same is true when it comes to their data storage needs.


 

Categories: XML

October 29, 2003
@ 12:49 PM
Beware the Hoover Dustette
 

Categories: Ramblings