October 27, 2003
@ 03:39 PM

So it looks like my boss, his boss, his boss's boss, and his boss's boss's boss are all out at the Microsoft Professional Developer's Conference 2003 (aka PDC) where folks will get a sneak peak at the next versions of Windows, SQL Server and Visual Studio. Thus it looks like won't be much whip cracking going on this week so I can spend time working on my pet projects for work.

  1. XML Developer Center on MSDN: Mark Fussel recently posted complaints about the quality of some articles on XML he'd recently read. I generally feel the same way about websites dedicated to articles about XML. Of all the developer sites devoted to XML there are only two I've seen that aren't utter crap; XML.com and IBM's XML developerWorks site. Even these are kind of hit or miss, XML.com usually publishes about 3 articles a week of which one is excellent, one is good and one is crap. Which is fine except that the excellent article is typically about something that isn't directly applicable to what I work on. The problem with IBM's DeveloperWorks is that all the code is Java-centric which doesn't help me since I work with the .NET Framework.

    After seeing some of what Tim Ewald did with producing content around Microsoft technologies and XML Web Services via the Web Services Developer Center on MSDN I talked to some of the folks at MSDN about creating something similar for XML content. This was green lighted a while ago but preparations for PDC has stopped this from taking off until next month. In the meantime, I'll be creating my content plan and coming up with a list of authors (both Microsoft employees and non-Microsoft folks) for new dev center.

    So far I've gotten a couple of folks lined up internally as well as some excellent non-Microsoft folks like Daniel Cazzulino, Christoph Schittko and Oleg Tkachenko. Definitely expect some pages to the XML Home Page on MSDN in the next few months.

  2. Sequential XPath and Pull Based XML Parsing: In 2001, Arpan Desai presented on Sequential XPath at XML 2001. Relevant bits from the paper

    This paper will provide an explanation of and the subset of XPath which we will tentatively dub: Sequential XPath, or SXPath for ease of use. SXPath allows a event-based XML parser, such as a typical SAX-compliant XML parser, to execute XPath-like expressions without the need of more memory consumption than is normally used within a sequential pull-based parser.
    ...
    By creating a streaming XML parser which utilizes Sequential XPath, one is able to reap the inherent benefits of a streaming parser with the querying power of XPath. By defining this proper subset of XPath, we enable developers and users to utilize XML in a wide array of applications thought to be too performance sensitive for traditional XML processing.
    The code for the technology outlined above has actually been gathering dust on some hard drives at work for a while. I'm currently in the process of liberating this code so that everyone can get access to the combined benefits of pull-based parsing and XPath based matching of nodes. Hopefully folks should be able to download classes similar to the ones outlined in Arpan's presentation in the next few weeks. Hopefully by Christmas, everyone will be able to write code similar to the following snippet taken from Tim Bray's XML is too Hard for Programmers
while (<STDIN>) {
  next if (X<meta>X);
  if    (X<h1>|<h2>|<h3>|<h4>X)
  { $divert = 'head'; }
  elsif (X<img src="/^(.*\.jpg)$/i>X)
  { &proc_jpeg($1); }
  # and so on...
}
Of course you'll have to substitute the Perl code above for C#, VB.NET or any one the various languages targetted at the .NET Framework.

 

Tuesday, 28 October 2003 01:13:30 (GMT Standard Time, UTC+00:00)
Unfortunately, Dare's correction is infact, not correct. :) This has nothing to do with pull vs. push based parsers. In hindsight, "reasonably minimal buffering parser" would have been a more apt description. *sigh*
Comments are closed.