February 10, 2004
@ 05:30 PM

Robert Scoble has a post entitled Metadata without filling in forms? It's coming where he writes

Simon Fell read my interview about search trends and says "I still don't get it" about WinFS and metadata. He brings up a good point. If users are going to be forced to fill out metadata forms, like those currently in Office apps, they just won't do it. Fell is absolutely right.But, he assumed that metadata would need to be entered that way for every photo. Let's go a little deeper....OK, I have 7400 photos. I have quite a few of my son. So, let's say there's a new kind of application. It recognizes the faces automatically and puts a square around them. Prompting you to enter just a name. When you do the square changes color from red to green, or just disappears completely.
...
A roadblock to getting that done today is that no one in the industry can get along for enough time to make it possible to put metadata into files the way it needs to be done. Example: look at the social software guys. Friendster doesn't play well with Orkut which doesn't play well with MyWallop, which doesn't play well with Tribe, which doesn't play well with ICQ, which doesn't play well with Outlook. What's the solution? Fix the platform underneath so that developers can put these features in without working with other companies and/or other developers they don't usually work with.

The way WinFS is being pitched by Microsoft folks reminds me a lot of Hailstorm [which is probably unsurprising since a number of Hailstorm folks work on it] in that there are a lot of interesting and useful technical  ideas burdened by bad scenarios being hung on them. Before going into the the interesting and useful technical ideas around WinFS I'll start with why I consider the two scenarios mentioned by Scoble as “bad scenarios”.

The thought that if you make the file system a metadata store automatically makes search better is a dubious proposition to swallow when you realize that a number of the searches that people can't do today wouldn't be helped much by more metadata. This isn't to say some searches wouldn't work better (e.g. searching for songs by title or artist), however there are some search scenarios such as searching for a particular image or video from a bunch of image files with generic names or searching for a song by lyrics which simply having the ability to tag media types with metadata doesn't seem like enough. Once your scenarios start having to involve using “face recognition software” or “cameras with GPS coordinates” for a scenario to work then it is hard for people not to scoff. It's like a variation of the popular Slashdot joke

  1. Add metadata search capabilities to file system
  2. ???
  3. You can now search for “all pictures taken on Tommy's 5th birthday party at the Chuck E Cheese in Redmond”.

 with the ??? in the middle implying a significant dfficulty in going from step 1 to 3.

The other criticism is the fact that Robert's post implies that the reason applications can't talk to each other are technical. This is rarely the case. The main reasons applications don't talk to each other isn't a lack of technology [especially now that we have an well-defined format for exchanging data called XML] but for various social and business reasons. There are no technical reasons MSN Messenger can't talk to ICQ or which prevent Yahoo! Messenger from talking to AOL Instant Messenger. It isn't technical reasons that prevent my data in Orkut from being shared with Friendster or my book & music preferences in Amazon from being shared with other online stores I visit. All of these entities feel they have a competitive advantage in making it hard to migrate from their platforms.

The two things Microsoft needs to do in this space is are to (i) show how & why it is beneficial for different applications to share data locally and (ii) provide guidelines as well as best practices for applications to share data their data in a secure manner.

While talking to Joshua Allen, Dave Winer, Robert Scoble, Lili Cheng, and Curtis Wong yesterday it seemed clear to me that social software [or if you are a business user; groupware that is more individual-focused which gives people more control over content and information sharing] would be a very powerful and useful tool for businesses and end users if built on a platform like Longhorn with a smart data store that know how to create relationships between concepts as well as files (i.e. WinFS) and a flexible, cross platform distributed computing framework (i.e. Indigo).

The WinFS folks and Longhorn evangelists will probably keep focusing on what I have termed “bad scenarios” because they demo well but I suspect that there'll be difficulty getting traction with them in the real world. Of course, I may be wrong and the various people who've expressed incredulity at the current pitches are a vocal minority who'll be proved wrong once others embrace the vision. Either way, I plan to experiment with these ideas once Longhorn starts to beta and seeing where the code takes me.


 

Categories: Technology

February 10, 2004
@ 05:59 AM

As Joshua wrote in his blog we had lunch with Dave Winer this afternoon. We talked about the kind of stuff you'd have expected; RSS, ATOM and "social software". An interesting person at lunch was Lili Cheng who's the Group Manager of the Social Computing Group in Microsoft Research*. She was very interested in the technologies around blogging and thought “social software“ could become a big deal if handled correctly. Her group is behind Wallop and I asked if she'd be able to wrangle an invitation so I could check it out. Given my previous negative impressions of Social Software I'm curious to see what the folks at MSR have come up with. She seemed aware of the limitations of the current crop of “social software” that her hip with some members of the blogging crowd so I'd like to see what she thinks they do differently. I think a fun little experiment would be seeing what it would be like to integrate some interaction with “social software“ like Wallop into RSS Bandit. Too bad my free time is so limited.

* So MSFT has a Social Computing Group and Google has Orkut? If I worked at Friendster it would seem the exit strategy is clear, try to get bought by Yahoo! before the VC funds dry up.


 

Categories: Ramblings

In his blog post entitled Namepaces in Xml - the battle to explain Steven Livingstone wrote

It seems that Namespaces is quickly displacing Xml Schema as the thing people "like to hate" - well at least those that are contacing me now seem to accept Schema as "good".

Now, the concept of namespaces is pretty simple, but because it happens to be used explicitly (and is a more manual process) in Xml people just don't seem to get it. There were two core worries put to me - one calling it "a mess" and the other "a failing". The whole thing centered around having to know what namespaces you were actually using (or were in scope) when selecing given nodes. So in the case of SelectNodes(), you need to have a namespace manager populated with the namespaces you intend to use. In the case of Schema, you generally need to know the targetNamespace of the Schema when working with the XmlValidatingReader. What the guys I spoke with seemed to dislike is that you actually have to know what these namespaces are. Why bother? Don't use namespaces and just do your selects or validation.

Given that I am to some degree responsible for both classes mentioned in the above post, XmlNode (where SelectNodes()comes from) and XmlValidatingReader,  I feel compelled to respond.

The SelectNodes() problem is that people would like to perform XPath expressions over nodes and have it not worry about namespaces. For example given XML such as

<root xmlns=”http://www.example.com”>

<child />

</root>

to perform a SelectNodes() or SelectSingleNode() that returns the <child> element requires the following code

  XmlDocument doc = new XmlDocument(); 
  doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); 
  XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable); 
  nsmgr.AddNamespace("foo", "http://www.example.com");  //this is the tricky bit 
  Console.WriteLine(doc.SelectSingleNode("/foo:root/foo:child", nsmgr).OuterXml);   

whereas developers don't see why the code isn't something more along the lines of

  XmlDocument doc = new XmlDocument(); 
  doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); 
  Console.WriteLine(doc.SelectSingleNode("/root/child").OuterXml);   

which would be the case if there were no namespaces in the document.

The reason the latter code sample is not the case is because the select methods on the XmlDocument class are conformant to the W3C XPath 1.0 recommendation which is namespace aware. In XPath, path expressions that match nodes based on their names are called node tests. A node test is a qualified name or QName for short. A QName is syntactically an optional prefix and local name separated by a colon. The prefix is supposed to be mapped to a namespace and is not to be used literally in matching the expression. Specifically the spec states

A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.

There are a number of reasons why this is the case which are best illustrated with an example. Consider the following two XML documents

<root xmlns=“urn:made-up-example“>

<child xmlns=”http://www.example.com”/>

</root>

<root>

<child />

</root>

Should the query /root/child also match the <child> element for the above two documents as it does for the original document in this example? The 3 documents shown [including the first example] are completely different documents and there is no consistent, standards compliant way to match against them using QNames in path expressions without explicitly pairing prefixes with namespaces.

The only way to give people what they want in this case would be to come up with a proprietary version of XPath which was namespace agnostic. We do not plan to do this. However I do have a tip for developers showing how to reduce the amount of code it does take to write the examples. The following code does match the <child> element in all three documents and is fully conformant with the XPath 1.0 recommendation

XmlDocument doc = new XmlDocument(); 
doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); 
Console.WriteLine(doc.SelectSingleNode("/*[local-name()='root']/*[local-name()='child']").OuterXml);  

Now on to the XmlValidatingReader issue. Assume we are given the following XML instance and schema

<root xmlns="http://www.example.com">
 <child />
</root>

<xs:schema targetNamespace="http://www.example.com"
            xmlns:xs="http://www.w3.org/2001/XMLSchema"
            elementFormDefault="qualified">
       
  <xs:element name="root">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="child" type="xs:string" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

The instance document can be validated against the schema using the following code

XmlTextReader tr = new XmlTextReader("example.xml");
XmlValidatingReader vr = new XmlValidatingReader(tr);
vr.Schemas.Add(null, "example.xsd");

vr.ValidationType = ValidationType.Schema;
vr.ValidationEventHandler += new ValidationEventHandler (ValidationHandler);

while(vr.Read()){ /* do stuff or do nothing */  

As you can see you do not need to know the target namespace of the schema to perform schema validation using the XmlValidatingReader. However many code samples in our SDK to specify the target namespace where I specified null above when adding schemas to the Schemas property of the XmlValidatingReader. When null is specified it indicates that the target namespace should be obtained from the schema. This would have been clearer if we'd had an overload for the Add() method which took only the schema but we didn't. Hindsight is 20/20.


 

Categories: XML

February 8, 2004
@ 10:15 PM

I noticed Gordon Weakliem reviewed ATOM.NET, an API for parsing and generating ATOM feeds. I went to the ATOM.NET website and decided to take a look at the ATOM.NET documentation. The following comments come from two perspectives, the first is as a developer who'll most likely have to implement something akin to ATOM.NET for RSS Bandit's internal workings and the other is from the perspective of being one of the folks at Microsoft whose job it is to design and critique XML-based APIs.

  • The AtomWriter class is superflous. The class that only has one method Write(AtomFeed) which makes more sense being on the AtomFeed class since an object should know how to persist itself. This is the model we followed with the XmlDocument class in the .NET Framework which has an overloaded Save() method. The AtomWriter class would be quite useful if it allowed you to perform schema driven generation of an AtomFeed, the same way the XmlWriter class in the .NET Framework is aimed at providing a convenient way to programmatically generate well-formed XML [although it comes close but doesn't fully do this in v1.0 & v1.1 of the .NET Framework]

  • I have the same feelings about the AtomReader class. This class also seems superflous. The functionality it provides is akin to the overloaded Load() method we have on the  XmlDocument class in the .NET Framework. I'd say it makes more sense and is more usable if this functionality was provided as a Load() method on an AtomFeed class than as a separate class unless the AtomReader class actually gets some more functionality.

  • There's no easy way to serialize an AtomEntry class as XML which means it'll be cumbersome using this ATOM.NET for the ATOM API since it requires sending  elements as XML over the wire. I use this functionality all the time in RSS Bandit internally from passing entries as XML for XSLT themes to the CommentAPI to IBlogExtension.

  • There is no consideration for how to expose extension elements and attributes in ATOM.NET. As far as I'm concerned this is a deal breaker that makes the ATOM.NET useless for aggregator authors since it means they can't handle extensions in ATOM feeds even though they may exist and have already started popping up in various feeds.


 

Categories: XML

February 8, 2004
@ 09:37 PM

Lots of people seem to like the newest version of RSS Bandit. The most recent praise was the following post by Matt Griffith

I've been a Bloglines user for almost a year. I needed a portable aggregator because I use several different computers. Then a few months ago I got a TabletPC. Now portability isn't as critical since I always have my Tablet with me. I stayed with Bloglines though because none of the client-side aggregators I tried before worked for me.

I just downloaded the latest version RSS Bandit. I love it. It is much more polished than it was the last time I tried it. Combine that with the dasBlog integration and the upcoming SIAM support and I'm in hog heaven. Thanks Dare, Torsten, and everyone else that helped make RssBandit what it is.

Also it seems that at least one user liked RSS Bandit so much that he [or she] was motivated to write an article on Getting Started with RSS Bandit. Definitely a good starting point and something I wouldn't mind seeing become part of the official documentation once it's been edited and more details fleshed out.

Sweet.


 

Categories: RSS Bandit

A few weeks ago during the follow up to the WinFX review of the System.Xml namespace of the .NET Framework it was pointed out that our team hadn't provided guidelines for exposing and manipulating XML data in applications. At first, I thought the person who brought this up was mistaken but after a cursory search I realized the closest thing that comes to such a set of guidelines is Don Box's MSDN TV episode entitled Passing XML Data Inside the CLR. As good as Don's discussion is, a video stream isn't as accessible as a written article. In tandem with coming up with some of the guidelines for utilizing XML in the .NET Framework for internal purposes I'll put together an article based on Don's MSDN TV episode with an eye towards the next version of the .NET Framework.

If you watched Don's talk and had any questions about it or require any clarifications respond below so I can clarify them in the article I plan to write.


 

Categories: XML

February 8, 2004
@ 08:59 PM

Dave Winer is going to giving a talk at Microsoft Research tomorrow. Robert Scoble has is organizing a lunch before the talk with some folks at MSFT and Dave. I may or may not make it since my mom's visiting from Nigeria and I was planning to take most of the week off. Just in case, I miss it there is one thing I'd like Dave to know; most of the problems in the XML-based website syndication space could have been solved if he didn't act as if once he wrote a spec or code for the Radio Userland aggregator then it was impossible to change. Most of the supposed “problems” with RSS would take 30 minutes to fix in the spec and about a day to fix in the Radio Userland codebase (I'm making assumptions here based on how long it would take in the RSS Bandit codebase). Instead he stonewalled and now we have the ATOM mess. Of course, we'd still need something like the ATOM effort to bring the blogging APIs into the 21st century but we wouldn't have to deal with incompatibilities at the website syndication level as well.

 

In a recent blog post Dave mentions that his MSR talk will mainly be about the themes from his article Howard Dean is not a soap bar. I don't really have an opinion on the content one way or the other but I did dislike the way he applies selective memory to prove a point specifically

In the lead-up to the war in Iraq, for some reason, people who were against the war didn't speak.

Maybe they didn't speak on the East coast but there was a very active anti-War movement on the West coast especially in the Seattle area. Actually they did speak out on the East Coast as well, in fact hundreds of thousands of voices all over the US and all over the world spoke out.

It's makes me view the “blogs are the second coming” hype with suspicion when it's boosters play fast and loose with the facts to sell their vision.


 

Categories: Life in the B0rg Cube

I've seen a lot of the hubbub about Janet Jackson's "costume reveal" at the Superbowl and tend to agree with Dan Gillmor that was just one in a series of classless and crass things about the Superbowl. However I've learned something new about Janet Jackson I didn't before this incident, she's dating Jermaine Dupri. Now I'm not one to knock someone's lifestyle choices but come on, Jermaine Dupri? That's almost as stunning as finding out that Whitney Houston ended up with Bobby “3 baby mamas” Brown.  


 

Categories: Ramblings

February 8, 2004
@ 06:02 PM

I stumbled on a link to MSN Search beta site while reading about Google cancelling its Spring IPO. What I find particularly interesting is that it seems to use very similar algorithms to Google given that it falls for Google Bombs as well. For example, check out the results for miserable failure and litigious bastards. In fact, the results are so similar at first I thought it was a spoof site that was just calling out to Google in the background.


 

February 7, 2004
@ 11:28 PM

Yesterday I wrote an entry about the fact that given Microsoft is such a big player in the software market there is the perception [whether correct or incorrect] that once Microsoft enters a technology or product space then smaller players in the market will lose out as customers migrate or Microsoft outmarkets/outspends/outcompetes them. The post also dwelled on a related topic, the perception that Microsoft is fond of vaporware announcements to hinder progress in prospective markets.

After writing the post, I deleted it after it had been on my blog for about 5 minutes. I wasn't happy with the quality of the writing and didn't feel I properly expressed my thoughts. However just the process of writing stuff down made me feel better. Having seen the effects of the Microsoft entering smaller markets on existing participants at a personal level (at least one person has claimed that the fact that I created EXSLT.NET lost him business) as well as at the product unit level (the various technologies the SQL Server Product Unit comes up with from ADO.NET to SQL Server to core XML technologies) there were various thoughts bubbling within me and writing them down helped me understand then come to grips with them.

I definitely need to get a personal journal. I'd have loved to read that post one, five and ten years from now. However it wasn't really for public consumption.


 

Categories: Life in the B0rg Cube