Since I'm not at the Microsoft Professional Developer's conference, I decided to answer questions by attendees about stuff that I am directly or indirectly responsible for right here. So let's roll the questions out

Alan Dean writes

    • You can leverage the XML support in .NET against a data source of your choice (for example, the Registry) by implementing a new XmlReader. We were pointed to work by Mark Fussell on MSDN to do this (Writing XML Providers for Microsoft .NET).

      In Whidbey we will be encouraging people to implement custom XPathNavigator instances instead of custom XmlReaders unless their situation specifically calls for forward-only processing. The ObjectXPathNavigator is an example of a custom navigator

    • Tim indicated that the whitespace handling had been particularly useful in the field. He mentioned a gotcha with empty elements; namely that this
      <elementName></elementName>
      is actually the same as this
         <elementName>
         </elementName>

      except that the second has been pretty-printed with a CRLF and some whitespace indentation. The only way to handle this correctly in your XmlReader, however, is to use _reader.WhitespaceHandling = WhitespaceHandling.None

      This is a legacy from what I like to call our "unconformant by default" era which is how we shipped the XML parser in v1.0 and v1.1. There were a couple of nice features like not erroring on invalid characters and the above feature that people needed in some cases but shouldn't have been the default behavior since it was extremely difficult to figure out how to turn of all the features and get a conformant XML 1.0 parser. In v2.0 we're going to a "conformant by default" mode where people have to go out of their way to read in unconformant XML not the other way around.

    • The current pain suffered by not being able to CreateElement independent of an XmlDocument which led to the ImportNode hack to allow movement between documents. They were sufficiently noisy on this to lead me to think that this is resolved in Whidbey.

      To my knowledge this behavior will remain the same in Whidbey.

Kirk Allen Evans writes

There will be another, more difficult, XML parser, and you will hear Mark Fussell talk about that later this week.“ - Don Box

I would hedge bets that this revolves around XQuery or something, I am looking forward to hearing what this is.

I am extremely curious about what this means myself but don't think Don was talking about XQuery. If anything he probably was talking about APIs but even then we aren't changing much from what we provided in v1 in the area of the XML parser (i.e. the XmlReader class) although the implementation has been rewritten to be faster and more conformant (much props to Helena). I am puzzled about what Don meant by that statement since I've seen Mark's slides and there really isn't anything about another XML parser that users have to learn. Perhaps he meant another XML API which is valid given that we did a ton of work on the XPathDocument for v2.0 which means there may be a lot of people moving from using XmlDocument to using XPathDocument for a number of reasons.

 


 

Categories: Life in the B0rg Cube

October 27, 2003
@ 03:39 PM

So it looks like my boss, his boss, his boss's boss, and his boss's boss's boss are all out at the Microsoft Professional Developer's Conference 2003 (aka PDC) where folks will get a sneak peak at the next versions of Windows, SQL Server and Visual Studio. Thus it looks like won't be much whip cracking going on this week so I can spend time working on my pet projects for work.

  1. XML Developer Center on MSDN: Mark Fussel recently posted complaints about the quality of some articles on XML he'd recently read. I generally feel the same way about websites dedicated to articles about XML. Of all the developer sites devoted to XML there are only two I've seen that aren't utter crap; XML.com and IBM's XML developerWorks site. Even these are kind of hit or miss, XML.com usually publishes about 3 articles a week of which one is excellent, one is good and one is crap. Which is fine except that the excellent article is typically about something that isn't directly applicable to what I work on. The problem with IBM's DeveloperWorks is that all the code is Java-centric which doesn't help me since I work with the .NET Framework.

    After seeing some of what Tim Ewald did with producing content around Microsoft technologies and XML Web Services via the Web Services Developer Center on MSDN I talked to some of the folks at MSDN about creating something similar for XML content. This was green lighted a while ago but preparations for PDC has stopped this from taking off until next month. In the meantime, I'll be creating my content plan and coming up with a list of authors (both Microsoft employees and non-Microsoft folks) for new dev center.

    So far I've gotten a couple of folks lined up internally as well as some excellent non-Microsoft folks like Daniel Cazzulino, Christoph Schittko and Oleg Tkachenko. Definitely expect some pages to the XML Home Page on MSDN in the next few months.

  2. Sequential XPath and Pull Based XML Parsing: In 2001, Arpan Desai presented on Sequential XPath at XML 2001. Relevant bits from the paper

    This paper will provide an explanation of and the subset of XPath which we will tentatively dub: Sequential XPath, or SXPath for ease of use. SXPath allows a event-based XML parser, such as a typical SAX-compliant XML parser, to execute XPath-like expressions without the need of more memory consumption than is normally used within a sequential pull-based parser.
    ...
    By creating a streaming XML parser which utilizes Sequential XPath, one is able to reap the inherent benefits of a streaming parser with the querying power of XPath. By defining this proper subset of XPath, we enable developers and users to utilize XML in a wide array of applications thought to be too performance sensitive for traditional XML processing.
    The code for the technology outlined above has actually been gathering dust on some hard drives at work for a while. I'm currently in the process of liberating this code so that everyone can get access to the combined benefits of pull-based parsing and XPath based matching of nodes. Hopefully folks should be able to download classes similar to the ones outlined in Arpan's presentation in the next few weeks. Hopefully by Christmas, everyone will be able to write code similar to the following snippet taken from Tim Bray's XML is too Hard for Programmers
while (<STDIN>) {
  next if (X<meta>X);
  if    (X<h1>|<h2>|<h3>|<h4>X)
  { $divert = 'head'; }
  elsif (X<img src="/^(.*\.jpg)$/i>X)
  { &proc_jpeg($1); }
  # and so on...
}
Of course you'll have to substitute the Perl code above for C#, VB.NET or any one the various languages targetted at the .NET Framework.

 

Categories: XML

October 25, 2003
@ 05:47 PM

Get it here

Differences between v1.1.0.36 and v1.2.0.42 below.

  • Support for password protected feeds using either HTTPS/SSL or HTTP Authentication. This feature can be tested using Steven Garrity's test feeds.
  • The ability to store and retrieve feed list from remote locations such as a dasBlog blog, an FTP server or a network file share. This enables users utilizing RSS aggregators on multiple machines to synchronize their feed list from a single point. This feature has been called a subscription harmonizer by some.
  • Multiple feeds downloaded simultaneously instead of one at a time thus reducing download time.
  • When saving as OPML, the hierarchy of the feed list is preserved instead of writing out a flat structure.
  • Default theme for viewing items changed to resemble that of a mail reader like Outlook Express. 
  • Added support for <dc:author> and <author> elements to a number of templates including the default theme.
  • FIXED: Feed list corruption when importing an OPML file where xmlUrl="" for some feeds
  • FIXED: NullReferenceException involving streams when accessing feeds after RSS Bandit has been running for a long time.


 

Categories: RSS Bandit

"This paper proposes extending popular object-oriented programming languages such as C#, VB or Java with native support for XML. In our approach XML documents or document fragments become first class citizens. This means that XML values can be constructed, loaded, passed, transformed and updated in a type-safe manner. The type system extensions, however, are not based on XML Schemas. We show that XSDs and the XML data model do not fit well with the class-based nominal type system and object graph representation of our target languages. Instead we propose to extend the C# type system with new structural types that model XSD sequences, choices, and all-groups. We also propose a number of extensions to the language itself that incorporate a simple but expressive query language that is influenced by XPath and SQL. We demonstrate our language and type system by translating a selection of the XQuery use cases."

From Programming with Rectangles, Triangles, and Circles by Erik Meijer and Wolfram Schulte

I talk to Erik about this stuff all the time, so it's great to finally see some of the thoughts and discussions around this topic actually written down in a research paper. According to Erik's blog post from a few weeks ago he'll actually be presenting about this at XML 2003


 

Categories: XML

According to C|Net News

Amazon.com on Thursday unveiled a new service that lets bookworms search through pages of thousands of books available on its online store.

The service, dubbed "Search Inside the Book," lets people type in any keyword and receive results for all the pages and titles of various books that contain that term. In the past, Amazon customers could search only by author name, title or keyword.

I am impressed by how in one move Amazon made their search feature utterly useless. I just tried to search for "open source xml" and "java xml" books on Amazon it and it was a fucking disaster. Even the top 10 hits that were returned were polluted with books that simply had the words "Java" or "XML" somewhere in the book. In fact almost every search I tried returned Oracle9i JDeveloper Handbook  in the top 10. If ever a feature needed to be turned off by default it is this one.


 

Categories: Ramblings

October 23, 2003
@ 06:42 PM

Every once in a while I notice links from educational institutions that use my writings for their classes in my referrer logs. It gives me the warm fuzzies to know that I'm actually [indirectly] teaching a generation of CS geeks. In the past month I've seen the following referrers/references to my writings

My corrupting influence spreads...


 

Categories: Ramblings

October 23, 2003
@ 05:56 PM

I picked up a Belkin Mobile iPod FM Transmitter. on a whim last night. At first, I had issues with the amount of static and feedback that were being emitted from the speakers but once I figured out that I was supposed to turn down the volume on my iPod and turn it up on the car stereo it was heaven. Since this was an impulse buy I didn't shop around but if I had I may have decided on an iTrip instead since there are no dangling wires and batteries are not required. I'll see how I feel about the Belkin device in a week or so.

According to Slashdot, B0rg Central didn't have anything nice to say about the launch of iTunes on Windows. Looking beyond what seem like obvious sour grapes it is a bummer that iPods don't support the WMA format.

My favorite B0rg hater, Russell Beattie, has this to say about the iPod

So here's my thoughts: 1) The current iPod needs a successor and soon because consumers will start to balk at the B&W interface. 2) With the color screen and all that storage, it'd be dumb not to show multimedia like Photos and Video. 3) If Apple's going to show multimedia, they'll probably want to use Quicktime to do it... 4) If they're going that route, they'd need a Mobile OS to run it on. (Not to mention for other needs like supporting Wireless access to the iPod via WiFi or Bluetooth).

I guess I'm about the reveal myself as being a Luddite but I have no problem with the B & W iPod interface nor am I interested in taking pictures or playing videos on my music player. This annoying convergence of features has not interested me in my cell phone (which happen to have lost useful features over time like password protected address books for frivolous shit like games, web browsing and taking pictures) and I definitely don't want it in my music player especially if it keeps the price high instead of allowing it to drop to a more reasonable amount so I can pick up a few as Xmas gifts.


 

Categories: Ramblings

Many have complained about the fact that one of the major problems with RSS aggregators is the fact that if one uses an aggregator on multiple machines (such as at home and at work) then there is no easy way to synchronize the readers on both machines. This was one of the problems I set out to solve when I first started working on RSS Bandit, now thanks to some prodding from some of the co-developers on the RSS Bandit workspace multiple solutions have been implemented. Click below for details.
 

Categories: RSS Bandit

October 21, 2003
@ 07:35 AM

Dave Winer writes

Just had a phone talk with Scoble, and finally I have a clue why people use aggregators integrated with email clients. He had a couple of compelling reasons. 1. Since it's integrated with email he can easily forward an item to people he works with via email. 2. He has a folder where he drags items he wants to write about later. BTW he uses NewsGator. I still prefer the blog-style interface of Radio's aggregator.

Both of which are features RSS Bandit supports. There is one feature requested by Jeff Sandquist which Newsgator has and RSS Bandit does not; the ability to specify a username/password combo when accessing a particular feed. Torsten and I will see about getting this in by the weekend so Jeff can use it next week.

My bed beckons but so do my recent purchases that just arrived in the mail; Chinese Super Ninja and Shaolin Challenges Ninja.

Bah, sleep is for the weak.


 

Categories: RSS Bandit

October 19, 2003
@ 07:56 PM
The original impetus for designing XML was to create "SGML on the Web". Six years later, although XML has found widespread applicability in the software industry it seems to have failed at its original goal. Some thoughts about this follow.
 

Categories: XML