October 22, 2004
@ 05:01 AM

It's hard to tell the headlines Google misses earnings expectations and Google's 3Q Profit More Than Doubles  are about the same occurence. I guess it's all about perspective. The market seems to be taking the half full approach seeing as GOOG jumped by $8.89 to close at $149.38.

Even more impressive is that it is trading at $161.79 in after hours trading. And I thought the original IPO price of $135 was excessive. Dang.


 

Karl Waclaweck has released version 1.0 of the SAX for .NET project. In the announcement on XML-DEV Karl writes

This is the first production release of the C#/.NET port of the SAX API.
It should be compatible with MS.NET 1.1 and Mono 1.0.2.

Since the API alone is not enough, a SAX parser implementation has been
released as well: SAXExpat.NET 1.0. It is based on the Expat parser, and
will work on MS.NET 1.1. Currently Mono 1.0.2 is not able to run it,
but this will hopefully change with future Mono releases.

Another implementation based on a port of the AElfred parser will
be available soon. It should work under both, MS.NET and Mono.

The project page is here: http://saxdotnet.sourceforge.net/

It's good to see more Open Source XML projects showing up for the .NET Framework. I haven't missed using SAX that much but I imagine people coming to the .NET Framework from the Java world would love to be able to keep using their favorite push model XML parsing API.


 

Categories: XML

It seems about half the feeds in my aggregator are buzzing with news of the new Google Desktop Search. Although I don't really have the need for a desktop search product I was going to download it and try it out anyway until I found it user a web browser interface accessed via a local web server. Not being a fan of browser based user interfaces I decided to pass. Since then I've seen a couple of posts from people like Joe Gregorio and Julia Lerman who've claimed that Google Desktop Search delivers the promise of WinFS today.

Full text search is really orthogonal to what WinFS is supposed to enable on the Windows platform. I've written about such misconceptions in the past, most recently in my post Killing the "WinFS is About Making Search Better" Myth where I wrote

At its core, WinFS was about storing strongly typed objects in the file system instead of opaque blobs of bits. The purpose of doing this was to make accessing and manipulating the content and metadata of these files simpler and more consistent. For example, instead of having to know how to manipulate JPEG, TIFF, GIF and BMP files there would just be a Photo item type that applications would have to deal with. Similarly one could imagine just interacting with a built in Music item instead of programming against MP3, WMA, OGG, AAC, and WAV files. In talking to Mike Deem a few months ago and recently seeing Bill Gates discuss his vision for WinFS to folks in our building a few weeks ago it is clear to me that the major benefits of WinFS to end users is the possibilities it creates in user interfaces for data organization.

Recently I switched from using WinAmp to iTunes on the strength of the music organizational capabilities of the iTunes library and "smart playlists". The strength of iTunes is that it provides a consistent interface to interacting with music files regardless of their underlying type (AAC, MP3, etc) and provides ways to add metadata about these music files (ratings, number of times played) then organize these files according to this metadata. Another application that shows the power of the data organization based on rich, structured metadata is Search Folders in Outlook 2003. When I used to think of WinFS I got excited about being able to perform SQL-like queries over items in the file system. Then I heard Bill Gates and Mike Deem speak about WinFS then saw them getting excited about the ability to take the data organizational capabilities of features like the My Pictures and My Music folders in Windows to the next level it all clicked.

Now this isn't to say that there aren't some searches made better by coming up with a consistent way to interact with certain file types and providing structured metadata about these files. For example a search like

Get me all the songs [regardless of file type] either featuring or created by G-Unit or any of its members (Young Buck, 50 Cent, Tony Yayo or Lloyd Banks) between 2002 and 2004 on my hard drive

is made possible with this system. However it is more likely that I want to navigate this in a UI like the iTunes media library than I want to type the equivalent of SQL queries over my file system.

Technologies like Google Desktop Search solve a problem a few people have while WinFS is aimed at solving a problem most computer users have. The problem the Google Desktop Search mainly satisfies is how to locate a single file in your file system that may not be easy to navigate to via the traditional file system explorer. However most computer users put files in few locations on their file system so they usually know where to find the file they need. Most of the time I put files in one of four folders on my hard drive 

  1. My Documents
  2. My Music
  3. Visual Studio Projects (subfolder of My Documents)
  4. My Download Files 

For some of my friends you can substitute the "Visual Studio Projects" folder for the "My Pictures" folder. I also know a number of people who just drop everything on their Windows desktop. However the point is still the same, lots of computer users store a large amount of their content in a single location where it eventually becomes hard to manipulate, organize and visualize the hundreds of files contained therein. The main reason I stopped using WinAmp was that the data organization features of Windows Explorer are so poor. Basically all I have when dealing with music files is 'sort by type' or some variation of 'sort by name' and a list view. iTunes changed the way I listened to music because it made it extremely easy to visualize and navigate my music library. The ability to also perform rich ad-hoc queries via Smart Playlists is also powerful but a feature I rarely use.

Tools like Lookout and Google Desktop Search are a crutch to get around the fact that the file navigation metaphor on most desktop systems is past its prime and is in dire need of improvement. This isn't to say fast full text search isn't important, even with all the data organizational capabilities of Microsoft Outlook I still tend to use Lookout when looking for emails sent past a few weeks ago. However it is not the high order bit in solving the problems most computer users have with locating and interacting with the files on their hard drives.

The promise of WinFS is that it aims to turn every application [including file navigation applications like Windows explorer] into the equivalent of Outlook and iTunes when it comes to data visualization and navigation by baking such functionality into the file system APIs and data model. Trying to reduce that to "full text search plus indexing" is missing the forest for the trees. Sure that may get you part of the way but in the end it's like driving a car with your feet. There is a better way and it is much closer than most people think.


 

Categories: Technology

In his post The gender profile of Wikipedia Joi ito writes

I haven't conducted any scientific analysis or anything, but Wikipedia seems much more gender balanced than the blogging community. I know many people point out that ratio of men at conferences on blogging and ratio of men who have loud blog voices seems to be quite high.

The core mistake in the assumption Joi Ito makes here is in assuming that participation is equivalent to talking about participation. I've seen several statistics and surveys on blogosphere (God, I hate that word) participation which all seem to point to the same conclusion; the number of female bloggers tends to outnumber the number of male bloggers.

For example, according to the LiveJournal statistics page there twice as many females blogging as males in that community. Given that LiveJournal is one of the oldest and largest blogging communities with almost 2 million active blogs (and almost 5 million user accounts) I think this counts for a lot more than claiming that a lot of women aren't seen at geeky conferences like Web 2.0 or Tim O'Reilly's FooCamp.   

One thing I've found interesting is wondering why if there are more female bloggers than male bloggers, lists such as the Technorati Top 100 are dominated by male bloggers. On Friday I attended a talk by Susan Herring which shed some light on the issue entitled Conversations in the blogosphere: An analysis "from the bottom up." where she discussed some research she and her colleagues had undertaken to discover the nature of conversations in the blogosphere. Interesting data points from her presentation include

  • Blogs with more outgoing links tend to be more linked to
  • Women tend to link less than men (even in women-centric blog circles such as homeschooling the rate of linking is comparatively less than male-centric blog circles like warblogging)
  • Linking doesn't tend to be reciprocal
  • Small percentage of blogosphere is interlinked.
  • 67% - 75% of blogs surveyed don't link outwards to other blogs
  • 95% of blogs surveyed have less than 10 inbound links from other blogs

Some of the data points from Susan's presentation gave me some ideas as to why the various lists of popular weblogs always seem to be male dominated.

The methodology and results of Susan's research can be obtained from her paper, Conversations in the blogosphere: An analysis "from the bottom up."


 

Categories: Ramblings

Derek Denny-Brown, the dev lead for both MSXML & System.Xml, who's been involved with XML before it even had a name has finally started a blog. Derek's first XML-related post is Where XML goes astray... which points out three features of XML that turn out to have caused significant problems for users and implementers of XML technologies. He writes

First, some background: XML was originally designed as an evolution of SGML, a simplification that mostly matched a lot of then existing common usage patterns. Most of its creators saw XML and evolving and expanding the role of SGML, namely text markup. XML was primarily intended to support taking a stream of text intended to be interpreted as a human readable document, and delineate portions according to some role. This sequence of characters is a paragraph. That sequence should be displayed with a link to some other information. Et cetera, et cetera. Much of the process in defining XML based on the assumption that the text in an XML document would eventually be exposed for human consumption. You can see this in the rules for what characters are allowed in XML content, what are valid characters in Names, and even in "</tagname>" being required rather than just "</>".
...
Allowed Characters
The logic went something like this: XML is all about marking up text documents, so the characters in an XML document should conform to what Unicode says are reasonable for a text document. That rules out most control characters, and means that surrogate pairs should be checked. All sounds good until you see some of the consequences. For example, most databases allow any character in a text column. What happens when you publish your database as XML? What do you do about values that include characters which are control characters that the XML specification disallowed? XML did not provide any escaping mechanism, and if you ask many XML experts they will tell you to base64 encode your data if it may include invalid characters. It gets worse.

The characters allowed in an XML name are far more limited. Basically, when designing XML, they allowed everything that Unicode (as defined then) considered a ‘letter’ or a ‘number’. Only 2 problems with that: (1) It turns out many characters common in Asian texts were left out of that category by the then-current Unicode specification. (2) The list of characters is sparse and random, making implementation slow and error prone.
...
Whitespace
When we were first coding up MSXML, whitespace was one of our perpetual nightmares. In hand-authored XML documents (the most common form of documents back then), there tended to be a great deal of whitespace. Humans have a hard time reading XML if everything is jammed on one line. We like a tag per line and indenting. All those extra characters, just there so that our feeble minds could make sense of this awkward jumble of characters, ended up contributing significantly to our memory footprint, and caused many problems to our users. Consider this example:
 <customer>  
           <name>Joe Schmoe</name>  
           <addr>123 Seattle Ave</addr> 
  </customer>
A customer coming to XML from a database back ground would normally expect that the first child of the <customer> element would be the <name> element. I can’t explain how many times I had to explain that it was actually a text node with the value newline+tab.
...
XML Namespaces
Namespaces is still, years after its release, a source of problems and disagreement. The XML Namespaces specification is simple and gets the job done with minimum fuss. The problem? It pushes an immense burden of complexity onto the APIs and XML reader/writer implementations. Supporting XML Namespaces introduces significant complexity in the parsers, because it forces parsers to parse the entire start-tag before returning any text information. It complicates XML stores, such as DOM implementations, because the XML Namespace specification only discusses parsing XML, and introduces a number of serious complications to edit scenarios. It complicates XML writers, because it introduces new constraints and ambiguities.

Then there is the issue of the 'default namespace’. I still see regular emails from people confused about why their XPath doesn’t work because of namespace issues. Namespaces is possibly the single largest obstacle for people new to XML.

My experiences as the program manager for the majority of the XML programming model in the .NET Framework agree with this list. The above list hits the 3 most common areas people seem to have problems with working with XML in the .NET Framework. His blog post makes a nice companion piece to my The XML Litmus Test: Understanding When and Why to Use XML article on MSDN.


 

Categories: XML

As has been pointed out by others you can read your GMail inbox in any Atom-enabled aggregator that supports secure feeds. I just subscribed to my GMail inbox in RSS Bandit and it worked like a charm. Screenshot below.


 

Categories: RSS Bandit

We are in the process of locking down System.Xml for Beta 2 of the .NET Framework 2.0 and Visual Studio 2005. In the past few months we have received customer feedback about our feature set previewed in the Whidbey Alpha & Whidbey Beta 1 and this has guided our decision making process as to where to focus our energies to ensure a comprehensive feature set.

Below is the list of changes to System.Xml and subsidiary namespaces that have occurred between Beta 1 and Beta 2 of the .NET Framework 2.0 release.

ADDITIONS

XmlSchemaValidator

The XmlSchemaValidator class provides a push model API for W3C XML Schema validation. The primary scenario for using the XmlSchemaValidator is for validating an XML infoset in-place without having to serialize it as an XML document then reparse the document using a validating XML reader.

CHANGES

XmlReader

  • Added overloads to the static Create() method that take XmlParserContext
  • ReadValueAsXXX() methods renamed to ReadContentAsXXX(). Also reduced the number of ReadContentAsXXX() methods relative to the number of ReadValueAsXXX() methods in Whidbey beta 1.
  • Added ReadElementContentAsXXX() methods which are specific to obtaining the value of element nodes
  • Added methods for reading large streams of text or binary data embedded in an XML document in a streaming fashion

public virtual bool CanReadValueChunk { get; }

public virtual int ReadValueChunk (byte[] buffer, int startIndex, int count);

public virtual bool CanReadBinaryContent { get; }

public virtual int ReadContentAsBase64 (byte[] buffer, int startIndex, int count);

public virtual int ReadContentAsBinHex (byte[] buffer, int startIndex, int count);

public virtual int ReadElementContentAsBase64(byte[] buffer, int startIndex, int count);

public virtual int ReadElementContentAsBinHex(byte[] buffer, int startIndex, int count);

  • Added ReadToFollowing(string localname, string namespaceURI) which moves to the next occurrence of the named element in document order.

XmlReaderSettings

  • Added XmlSchemaValidationFlags enumeration to replace the following flags; IgnoreInlineSchema, IgnoreSchemaLocation, IgnoreValidationWarnings and IgnoreIdentityConstraints
  • Added existing ValidationType enumeration to replace to replace the following flags; DtdValidate and XsdValidate

XmlWriter

  • Reduced number of WriteValue() methods
  • Removed overloads of WriteStartElement and WriteStartAttribute that took an IXmlSchemaInfo parameter

XPathDocument

XPathNavigator and XPathEditableNavigator

  • The XPathEditableNavigator has been merged into the XPathNavigator, making it an editable XML cursor model API.
  • The XPathNavigator is the preferred API for exposing data as XML. This has been incorporated into the design guidelines for using XML in the .NET Framework

XmlDocument

  • The XPathNavigator returned by the CreateNavigator() method now allows one to edit the XmlDocument through the cursor model API.
  • The XmlDocument now supports XML schema validation of the entire subtree or partial validation of nodes in the document using the Validate() method
  • The following property added to XmlDocument

public XmlSchemaSet Schemas { get; set; }

  • The following property added to XmlNode

public virtual IXmlSchemaInfo SchemaInfo { get; }

XsltCommand

  • The XslTransform class was obsoleted in Whidbey Beta 1 and replaced by the System.Xml.Query.XsltCommand class.  In Beta 2, we decided to revamp the XsltCommand API in order to make migration from XslTransform simpler.  This effort also resulted in the renaming of the XsltCommand class to  System.Xml.Xsl.XslCompiledTransform.
  • XslCompiledTransform compiles XSLT to MSIL for significantly improved performance at the cost of increased (yet still small) compile times.
  • Supports the MSXML XSLT extension functions such as format-date, format-time etc.

Inference

  • This class has been renamed to XmlSchemaInference

XPathExpression -

  • Added static Compile() method enables one to compile an input string containing an XPath query into an XPathExpression object

REMOVALS

XmlArgumentList

To reduce the cost of churn caused by the obsoletion of XslTransform this class has been removed. In its place the XsltArgumentList from v1.1 can be used

XQueryCommand

Microsoft has decided not to ship a client side XQuery implementation in .NET Framework 2.0 as our customers expect us to ship an implementation that meets the following criteria:

  • Compliant with the W3C standards
  • Functionally addresses key scenarios

As a core platform component in Windows, they also expect us to ship a product that meets the high bar of not breaking their applications when future updates are released.  After talking to key customers and partners, we have determined it is important that we cross this high bar before shipping a full implementation of XQuery in the platform. 

The best estimates tell us that ETA for XQuery to become a W3C recommendation is end of 2005 which does not fit with the .NET Framework 2.0 product release cycle.

In the meantime, we are shipping a well-defined small subset of XQuery in SQL Server 2005 to query information stored natively as XML data type.  This will enable new customer scenarios in SQL Server for storing and retrieving semi-structured data.

In the NET Framework 2.0 RTM timeframe, we recommend that our customers continue to use XSLT and XPath on the client side to solve their key client side filtering and transformation scenarios.  With this in mind, we have made significant improvements to our client side story including:

  • Performance improvements - making the .NET Framework XSLT processor the best performing processor.
  • Functional improvements - improving the usability and feature set of the existing .NET Framework processor

Note: As a result of not shipping XQuery, XML Views using mapping and XQuery to query SQL Server 2005 and the XmlAdapter to perform updates that were originally previewed in the PDC Alpha release of .NET V2.0 have also been removed. These were removed in the Beta 1 release.


 

Categories: Life in the B0rg Cube | XML

I just finished writing last month's Extreme XML column* entitled The XML Litmus Test: Understanding When and Why to Use XML. The article is a more formal write up from my weblog post The XML Litmus Test expanded to contain examples of appropriate and inappropriate uses of XML as well as with some of the criteria for choosing XML fleshed out. Below is an excerpt from the article which contains the core bits that I hope everyone who reads it remembers  

XML is the appropriate tool for the job if the following criteria are satisfied by choosing XML as the data representation format for a given application.

1.      there is a need to interoperate across multiple software platforms

2.      one or more of the off-the-shelf tools for dealing with XML can be leveraged when producing or consuming the data

3.      parsing performance is not critical

4.      the content is not primarily binary content such as a music or image file

5.      the content does not contain control characters or any other characters that are illegal in XML

If the expected usage scenario does not satisfy most or all of the above criteria then it doesn't make much sense to use XML as the data representation format for the situation in question.

As the program manager responsible for XML programming models and schema validation in the .NET Framework I've seen lots and lots of inappropriate usage of XML both from internal teams and our customers. Hopefully once this article is published I can stop repeating myself and just send people links to it next time I see someone asking how to escape control characters in XML or see another online discussion of "binary" XML.

* Yes, it's late


 

Categories: XML

A few days ago I saw the article Xamlon looks to beat Microsoft to the punch  on C|Net which begins

On Monday, Colton's company Xamlon released its first product, a software development kit designed to speed development of user interface software for Web applications. Xamlon built the program from the published technical specifications of Microsoft's own user interface development software, which Microsoft itself doesn't plan to release until 2006.

I've been having difficulty processing this news over the past few days. Reading the Xamlon homepage gives more cause for pause. The site proclaims

XAML is a revolution in Windows application development. Xamlon is XAML today.

  • Rapidly build Windows user interfaces with HTML-like markup
  • Easily draw user interfaces and convert directly to XAML
  • Deploy to the Windows desktop and to Internet Explorer with absolutely no changes to your application.
  • Run XAML applications on versions of Windows from ’98 to Longhorn, and via the Web with Internet Explorer
  • Write applications that port easily to Avalon

What I find interesting about this are the claims that involve unreleased products that aren't expected to beta until next year and ship the year afterwards. I can understand that it is cool to claim to have scooped Microsoft but considering that the XAML and Longhorn are still being worked on it seems strange to claim to have built a product that is compatible with unreleased Microsoft products.

While writing this blog entry I decided to take a quick glance at the various Avalon folks' blogs and I stumbled on a post entitled Attribute grammar for xaml attributes from Rob Relyea, a program manager on the Avalon team.  Rob writes

As part of this change, the flexibility that people have with compact syntax will be reduced.  Today, they can use *Bind(), *Button(), *AnyClass() in any attribute.  We'd like to restict this to a small set of classes that are explicitly in need of being set in an attribute.

I'm not going into great detail in the description of our fix because I'd prefer to be the first company to ship our design.  :-)

Considering that XAML isn't even in beta yet, one can expect a lot more changes to the language before it ships in 2006. Yet Xamlon claims to be compatible with XAML. I have no idea how the Xamlon team plans to make good on their promise to be compatible with XAML and Longhorn before they've even shipped but I'd love to see what developers out there think about this topic.

I totally empathize with the Avalon team right now. I'm in the process of drafting a blog post about the changes to System.Xml of the .NET Framework that have occured between Whidbey beta 1 and Whidbey beta 2. Even though we don't have companies building products based on interim versions of System.Xml we do have book authors who'll have to rewrite [or maybe even eliminate] chapters about our stuff based on changes then off course there's the Mono folks implementing System.Xml who seem to be tracking our Whidbey betas.  


 

I've been trying to come up with a list of the most disappointing movie sequels or prequels of all time. So far I've come up with three the got stuck.

  1. Phantom Menace, prequel to the Star Wars Trilogy
  2. Matrix Revolutions, sequel to Matrix Reloaded
  3. Escape from LA, sequel to Escape from NY

I'm curious as to what suggestions others have for filling out this list. There is one rule that has to be obeyed in submitting entries to this list. Sequels that went straight-to-video do not count. So the various Disney sequels to their major hits like Aladdin or Beauty & the Beast don't count nor do movies like Cruel Intentions 3 or Children of the Corn 7.

So which are your nominees for most disappointing movie sequel?


 

Categories: Ramblings