UPDATED: Things to note about foreach and System.Xml.XPath.XPathNodeIterator

May 10, 2005

@ 06:23 PM

Oleg Tkachenko has a post about one of the changes I was involved in while the program manager for XML programming models in the .NET Framework. In the post foreach and XPathNodeIterator - finally together Oleg writes

This one little improvement in System.Xml 2.0 Beta2 is sooo cool anyway: XPathNodeIterator class at last implements IEnumerable! Such unification with .NET iteration model means we can finally iterate over nodes in an XPath selection using standard foreach statement:
XmlDocument doc = new XmlDocument();
doc.Load("orders.xml");
XPathNavigator nav = doc.CreateNavigator();
foreach (XPathNavigator node in nav.Select("/orders/order"))
    Console.WriteLine(node.Value);
Compare this to what we have to write in .NET 1.X:
XmlDocument doc = new XmlDocument();
doc.Load("../../source.xml");
XPathNavigator nav = doc.CreateNavigator();
XPathNodeIterator ni = nav.Select("/orders/order");
while (ni.MoveNext())      
  Console.WriteLine(ni.Current.Value);
Needless to say - that's the case when just a dozen lines of code can radically simplify a class's usage and improve overall developer's productivity. How come this wasn't done in .NET 1.1 I have no idea.

One of the reasons we were hesitant in adding support for the IEnumerable interface to the XPathNodeIterator class is that the IEnumerator returned by the IEnumerable.GetEnumerator method has to have a Reset method. However a run of the mill XPathNodeIterator does not have a way to reset its current position. That means that code like the following has problems

XmlDocument doc = new XmlDocument();
doc.Load("orders.xml");
XPathNodeIterator it = doc.CreateNavigator().Select("/root/*");

foreach (XPathNavigator node in it) 
  Console.WriteLine(node.Name);
						
foreach (XPathNavigator node in it) 
 Console.WriteLine(node.Value);

The problem is that after the first loop the XPathNodeIterator is positioned at the end of the sequence of nodes so the second loop should not print any values. However this violates the contract of IEnumerable and the behavior of practically every other class that implements the interface. We considered adding an abstract Reset() method to the XPathNodeIterator class in Whideby but this would have broken implementations of that class written against previous versions of the .NET Framework.

Eventually we decided that even though the implementation of IEnumerable on the XPathNodeIterator would behave incorrectly when looping over the class multiple times, this was an edge case that shouldn't prevent us from improving the usability of the class. Of course, it is probable that someone may eventually be bitten by this weird behavior but we felt the improved usability was worth the trade off.

Yes, backwards compatibility is a pain.

UPDATE: Andrew Kimball, who's one of the developers of working on XSLT and XPath technologies in System.Xml posted a comment that corrected some of my statements. It seems that some different implementation decisions were made after I left the team. He writes

"You know how I hate to contradict you, but the example you give actually does work correctly in 2.0. The implementation of IEnumerable saves a Clone of the XPathNodeIterator so that Reset() can simply reset to the saved Clone. There were a couple of limitations/problems, but neither was serious enough to forego implementing IEnumerable:

1. Performance -- 2 clones of the XPathNodeIterator must be taken, one in case Reset is called, and one to iterate over. In addition, getting the Current property must clone the current navigator so that the navigator's position is independent of the iterator's position.

2. Mid-Iteration Weirdness -- If MoveNext() is called several times on the XPathNodeIterator, and *then* GetEnumerator() is called, the enumerator will only enumerate the remaining nodes, not the ones that were skipped over. Basically, users should either use the XPathNodeIterator methods to iterate, *or* the IEnumerable/IEnumerator methods, not both."

I guess it just goes to show how quickly knowledge can get obsoleted in the technology game. :)

Categories: XML

« Protocol Independence is a Leaky Abstrac... | Home | MSN Acquires MessageCast »

Tuesday, 10 May 2005 21:55:22 (GMT Daylight Time, UTC+01:00)

Dare,

You know how I hate to contradict you, but the example you give actually does work correctly in 2.0. The implementation of IEnumerable saves a Clone of the XPathNodeIterator so that Reset() can simply reset to the saved Clone. There were a couple of limitations/problems, but neither was serious enough to forego implementing IEnumerable:

1. Performance -- 2 clones of the XPathNodeIterator must be taken, one in case Reset is called, and one to iterate over. In addition, getting the Current property must clone the current navigator so that the navigator's position is independent of the iterator's position.

2. Mid-Iteration Weirdness -- If MoveNext() is called several times on the XPathNodeIterator, and *then* GetEnumerator() is called, the enumerator will only enumerate the remaining nodes, not the ones that were skipped over. Basically, users should either use the XPathNodeIterator methods to iterate, *or* the IEnumerable/IEnumerator methods, not both.

~Andy

Andy Kimball

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for UPDATED: Things to note about foreach and System.Xml.XPath.XPathNodeIterator - Dare Obasanjo's weblog