Daniel Cazzulino has been writing about his work with XML Streaming Events which combines the ability to do XPath queries with the .NET Framework's forward-only, pull based XML parser. He shows the following code sample

// Setup the namespaces XmlNamespaceManager mgr = new XmlNamespaceManager(temp.NameTable); mgr.AddNamespace("r", RssBanditNamespace); // Precompile the strategy used to match the expression IMatchStrategy st = new RootedPathFactory().Create( "/r:feeds/r:feed/r:stories-recently-viewed/r:story", mgr); int count = 0; // Create the reader. XseReader xr = new XseReader( new XmlTextReader( inputStream ) ); // Add our handler, using the strategy compiled above. xr.AddHandler(st, delegate { count++; }); while (xr.Read()) { } Console.WriteLine("Stories viewed: {0}", count);

I have a couple of questions about his implementation the main one being how it deals with XPath queries such as /r:feeds/r:feed[count(r:stories-recently-viewed)>10]/r:title which can't be done in a forward only manner?

Oleg Tkachenko also pipes in with some opinions about streaming XPath in his post Warriors of the Streaming XPath Order. He writes

I've been playing with such beasts, making all kinds of mistakes and finally I came up with a solution, which I think is good, but I didn't publish it yet. Why? Because I'm tired to publish spoilers :) It's based on "ForwardOnlyXPathNavigator" aka XPathNavigator over XmlReader, Dare is going to write about in MSDN XML Dev Center and I wait till that's published.

May be I'm mistaken, but anyway here is the idea - "ForwardOnlyXPathNavigator" is XPathNavigator implementation over XmlReader, which obviously supports forward-only XPath subset...

And after I played enough with and implemented that stuff I discovered BizTalk 2004 Beta classes contain much better implementation of the same functionality in such gems as XPathReader, XmlTranslatorStream, XmlValidatingStream and XPathMutatorStream. They're amazing classes that enable streaming XML processing in much rich way than trivial XmlReader stack does. I only wonder why they are not in System.Xml v2 ? Is there are any reasons why they are still hidden deeply inside BizTalk 2004 ? Probably I have to evangelize them a bit as I really like this idea.

Actually Oleg is closer and yet farther from the truth than he realizes. Although I wrote about a hypothetical ForwardOnlyXPathNavigator in my article entitled Can One Size Fit All? for XML Journal my planned article which should show up when the MSDN XML Developer Center launches in a month or so won't be using it. Instead it will be based on an XPathReader that is very similar to the one used in BizTalk 2004, in fact it was written by the same guy. The XPathReader works similarly to Daniel Cazzulino's XseReader but uses the XPath subset described in Arpan Desai's Introduction to Sequential XPath paper instead of adding proprietary extensions to XPath as Daniel's does.

When the article describing the XPathReader is done it will provide source and if there is interest I'll create a GotDotNet Workspace for the project although it is unlikely I nor the dev who originally wrote the code will have time to maintain it.


 

Monday, 16 February 2004 04:15:40 (GMT Standard Time, UTC+00:00)
Actually, your query

/r:feeds/r:feed[count(r:stories-recently-viewed)>10]/r:title

can be done forward-only whenever one of the following conditions is true:

(1) dynamically or statically every r:title occurs after 11 or more r:stories-recently-viewed preceding-siblings
(2) dynamically or statically there is no r:title, r:feed, or r:feeds element (with the requested hierarchical relationship)
(3) statically the truth of the predicate is known (e.g., through min-occurs/max-occurs schema info)

If you want a simpler example with fewer exceptions, try //a[//b] -- although this too can stream when either
(1) statically the predicate is known (there exists, or there does not exist, at least one b element)
or
(2) dynamically or statically one finds that all b elements occur before all a elements


Another example of a non-streaming query, courtesy Derek Denny-Brown, is
//a[@b = //c/@d]
Not only does this query not stream (unless conditions similar to the previous example are met), but even if you know the schema, you can't in general know whether it's better to buffer b attributes or d attributes (unless you have statistics about the runtime data, which usually implies that it's already buffered somewhere)

And hey, anything based on XmlReader buffers all attributes (for namespace info, even when none is required by the query).
Michael Brundage
Monday, 16 February 2004 05:12:32 (GMT Standard Time, UTC+00:00)
Michael,
None of the criteria you mention apply to that particular query but your point is taken that schema information can be used to assess whether a particular XPath match can be done in a streaming manner or not.

PS: I got a copy of your book in the mail so will start work on a review in the next few days.
Monday, 16 February 2004 05:35:09 (GMT Standard Time, UTC+00:00)
You said you wondered how it could handle that query "which can't be done in a forward-only manner" and I'm just pointing out that sometimes that query can be done in a forward-only manner.
Michael Brundage
Monday, 16 February 2004 19:05:16 (GMT Standard Time, UTC+00:00)
Followup comment: http://weblogs.asp.net/cazzu/posts/XseNotXPath.aspx
Comments are closed.