February 11, 2004
@ 04:02 PM

One of the big problems with arguing about metadata is that one persons data is another person's metadata. I was reading Joshua Allen's blog post entitled Trolling EFNet, or Promiscuous Memories where he wrote

  • Some people deride "metacrap" and complain that "nobody will enter all of that metadata".  These people display a stunning lack of vision and imagination, and should be pitied.  Simply by living their lives, people produce immense amounts of metadata about themselves and their relationships to things, places, and others that can be harvested passively and in a relatively low-tech manner.
  • Being able to remember what we have experienced is very powerful.  Being able to "remember" what other people have experienced is also very powerful.  Language improved our ability to share experiences to others, and written language made it possible to communicate experiences beyond the wall of death, but that was just the beginning.  How will your life change when you can near-instantly "remember" the relevant experiences of millions of other people and in increasingly richer detail and varied modality?
  • From my perspective it seems Joshua is confusing data and metadata. If I had a video camera attached to my forehead recording I saw then the actual audiovisual content of the files on my harddrive are the data while the metadata is information such as what date it was, where I was and who I saw. Basically the metadata is the data about data. The interesting thing about metadata is that if we have enough good quality metadata then we can do things like near-instantly "remember" the relevant experiences of ourselves and millions of other people. It won't matter if all my experiences are cataloged and stored on a hard drive if the retrieval process isn't automated (e.g. I can 'search' for experiences by who they were shared with, where they occured or when they occured) as opposed to me having to fast forward through gigabytes of video data. The metadata ideal would be that all this extra, descriptive information would be attached to my audiovisual experiences stored on disk so I could quickly search for “videos from conversations with my boss in October, 2003”.

    This is where metacrap comes in. From Cory Doctorow's excellent article entitled Metacrap

    A world of exhaustive, reliable metadata would be a utopia. It's also a pipe-dream, founded on self-delusion, nerd hubris and hysterically inflated market opportunities.

    This applies to Joshua's vision as well. Data acquisition is easy, anyone can walk around with a camcorder or digital camera today recording everything they can. Effectively tagging the content so it can be categorized in a way you can do interesting things with it search-wise is unfeasible. Cory's article does a lot better job than I can at explaining the many different ways this is unfeasible, cameras with datestamps and built in GPS are just a tip of the iceberg. I can barely remember dates once the event didn't happen in the recent past and wasn't a special occassion. As for built in GPS, until the software is smart enough to convert longitude and latitude coordinates to “that Chuck E Cheese in Redmond“ then they only solve problems for geeks not regular people.  I'm sure technology will get better but metacrap is and may always be an insurmountable problem on a global network like the World Wide Web without lots of standardization.


     

    Categories: Technology

    February 11, 2004
    @ 03:36 PM

    Besides our releases, Torsten packages nightly builds of RSS Bandit for folks who want to try out bleeding edge features or test recent bug fixes without having to set up a CVS client. There is currently a bug pointed out by James Clarke that we think is fixed but would like interested users to test

    I'm hitting a problem once every day or so when the refresh thread seems to be exiting - The message refreshing Feeds never goes away and the green download icons for the set remain green forever. No feed errors are generated. Only way out is to quit.

    If you've encountered this problem on recent versions of RSS Bandit, try out the RSS Bandit build from February 9th 2004 and see if that fixes the problem. Once we figure out the root of the problem and fix it there'll be a refresh of the installer with the updated bits.

     


     

    Categories: RSS Bandit

    February 11, 2004
    @ 02:51 AM

    From Sam Ruby's slides for the O'Reilly Emerging Technology Conference

    Where are we going?

    • A draft charter will be prepared in time to be informally discussed at the IETF meeting is Seoul, Korea on the week of 29 February to 5 March 
    •  Hopefully, the Working Group itself will be approved in March 
    •  Most of the work will be done on mailing lists 
    •  Ideally, a face to face meeting of the Working Group will be scheduled to coincide with the August 1-6 meeting of the IETF in San Diego

    Interesting. Taking the spec to IETF implies that Sam thinks it's mostly done.  Well, I just hope the IETF's errata process is better than the W3C's.


     

    Categories: Technology

    February 10, 2004
    @ 05:30 PM

    Robert Scoble has a post entitled Metadata without filling in forms? It's coming where he writes

    Simon Fell read my interview about search trends and says "I still don't get it" about WinFS and metadata. He brings up a good point. If users are going to be forced to fill out metadata forms, like those currently in Office apps, they just won't do it. Fell is absolutely right.But, he assumed that metadata would need to be entered that way for every photo. Let's go a little deeper....OK, I have 7400 photos. I have quite a few of my son. So, let's say there's a new kind of application. It recognizes the faces automatically and puts a square around them. Prompting you to enter just a name. When you do the square changes color from red to green, or just disappears completely.
    ...
    A roadblock to getting that done today is that no one in the industry can get along for enough time to make it possible to put metadata into files the way it needs to be done. Example: look at the social software guys. Friendster doesn't play well with Orkut which doesn't play well with MyWallop, which doesn't play well with Tribe, which doesn't play well with ICQ, which doesn't play well with Outlook. What's the solution? Fix the platform underneath so that developers can put these features in without working with other companies and/or other developers they don't usually work with.

    The way WinFS is being pitched by Microsoft folks reminds me a lot of Hailstorm [which is probably unsurprising since a number of Hailstorm folks work on it] in that there are a lot of interesting and useful technical  ideas burdened by bad scenarios being hung on them. Before going into the the interesting and useful technical ideas around WinFS I'll start with why I consider the two scenarios mentioned by Scoble as “bad scenarios”.

    The thought that if you make the file system a metadata store automatically makes search better is a dubious proposition to swallow when you realize that a number of the searches that people can't do today wouldn't be helped much by more metadata. This isn't to say some searches wouldn't work better (e.g. searching for songs by title or artist), however there are some search scenarios such as searching for a particular image or video from a bunch of image files with generic names or searching for a song by lyrics which simply having the ability to tag media types with metadata doesn't seem like enough. Once your scenarios start having to involve using “face recognition software” or “cameras with GPS coordinates” for a scenario to work then it is hard for people not to scoff. It's like a variation of the popular Slashdot joke

    1. Add metadata search capabilities to file system
    2. ???
    3. You can now search for “all pictures taken on Tommy's 5th birthday party at the Chuck E Cheese in Redmond”.

     with the ??? in the middle implying a significant dfficulty in going from step 1 to 3.

    The other criticism is the fact that Robert's post implies that the reason applications can't talk to each other are technical. This is rarely the case. The main reasons applications don't talk to each other isn't a lack of technology [especially now that we have an well-defined format for exchanging data called XML] but for various social and business reasons. There are no technical reasons MSN Messenger can't talk to ICQ or which prevent Yahoo! Messenger from talking to AOL Instant Messenger. It isn't technical reasons that prevent my data in Orkut from being shared with Friendster or my book & music preferences in Amazon from being shared with other online stores I visit. All of these entities feel they have a competitive advantage in making it hard to migrate from their platforms.

    The two things Microsoft needs to do in this space is are to (i) show how & why it is beneficial for different applications to share data locally and (ii) provide guidelines as well as best practices for applications to share data their data in a secure manner.

    While talking to Joshua Allen, Dave Winer, Robert Scoble, Lili Cheng, and Curtis Wong yesterday it seemed clear to me that social software [or if you are a business user; groupware that is more individual-focused which gives people more control over content and information sharing] would be a very powerful and useful tool for businesses and end users if built on a platform like Longhorn with a smart data store that know how to create relationships between concepts as well as files (i.e. WinFS) and a flexible, cross platform distributed computing framework (i.e. Indigo).

    The WinFS folks and Longhorn evangelists will probably keep focusing on what I have termed “bad scenarios” because they demo well but I suspect that there'll be difficulty getting traction with them in the real world. Of course, I may be wrong and the various people who've expressed incredulity at the current pitches are a vocal minority who'll be proved wrong once others embrace the vision. Either way, I plan to experiment with these ideas once Longhorn starts to beta and seeing where the code takes me.


     

    Categories: Technology

    February 10, 2004
    @ 05:59 AM

    As Joshua wrote in his blog we had lunch with Dave Winer this afternoon. We talked about the kind of stuff you'd have expected; RSS, ATOM and "social software". An interesting person at lunch was Lili Cheng who's the Group Manager of the Social Computing Group in Microsoft Research*. She was very interested in the technologies around blogging and thought “social software“ could become a big deal if handled correctly. Her group is behind Wallop and I asked if she'd be able to wrangle an invitation so I could check it out. Given my previous negative impressions of Social Software I'm curious to see what the folks at MSR have come up with. She seemed aware of the limitations of the current crop of “social software” that her hip with some members of the blogging crowd so I'd like to see what she thinks they do differently. I think a fun little experiment would be seeing what it would be like to integrate some interaction with “social software“ like Wallop into RSS Bandit. Too bad my free time is so limited.

    * So MSFT has a Social Computing Group and Google has Orkut? If I worked at Friendster it would seem the exit strategy is clear, try to get bought by Yahoo! before the VC funds dry up.


     

    Categories: Ramblings

    In his blog post entitled Namepaces in Xml - the battle to explain Steven Livingstone wrote

    It seems that Namespaces is quickly displacing Xml Schema as the thing people "like to hate" - well at least those that are contacing me now seem to accept Schema as "good".

    Now, the concept of namespaces is pretty simple, but because it happens to be used explicitly (and is a more manual process) in Xml people just don't seem to get it. There were two core worries put to me - one calling it "a mess" and the other "a failing". The whole thing centered around having to know what namespaces you were actually using (or were in scope) when selecing given nodes. So in the case of SelectNodes(), you need to have a namespace manager populated with the namespaces you intend to use. In the case of Schema, you generally need to know the targetNamespace of the Schema when working with the XmlValidatingReader. What the guys I spoke with seemed to dislike is that you actually have to know what these namespaces are. Why bother? Don't use namespaces and just do your selects or validation.

    Given that I am to some degree responsible for both classes mentioned in the above post, XmlNode (where SelectNodes()comes from) and XmlValidatingReader,  I feel compelled to respond.

    The SelectNodes() problem is that people would like to perform XPath expressions over nodes and have it not worry about namespaces. For example given XML such as

    <root xmlns=”http://www.example.com”>

    <child />

    </root>

    to perform a SelectNodes() or SelectSingleNode() that returns the <child> element requires the following code

      XmlDocument doc = new XmlDocument(); 
      doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); 
      XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable); 
      nsmgr.AddNamespace("foo", "http://www.example.com");  //this is the tricky bit 
      Console.WriteLine(doc.SelectSingleNode("/foo:root/foo:child", nsmgr).OuterXml);   

    whereas developers don't see why the code isn't something more along the lines of

      XmlDocument doc = new XmlDocument(); 
      doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); 
      Console.WriteLine(doc.SelectSingleNode("/root/child").OuterXml);   

    which would be the case if there were no namespaces in the document.

    The reason the latter code sample is not the case is because the select methods on the XmlDocument class are conformant to the W3C XPath 1.0 recommendation which is namespace aware. In XPath, path expressions that match nodes based on their names are called node tests. A node test is a qualified name or QName for short. A QName is syntactically an optional prefix and local name separated by a colon. The prefix is supposed to be mapped to a namespace and is not to be used literally in matching the expression. Specifically the spec states

    A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.

    There are a number of reasons why this is the case which are best illustrated with an example. Consider the following two XML documents

    <root xmlns=“urn:made-up-example“>

    <child xmlns=”http://www.example.com”/>

    </root>

    <root>

    <child />

    </root>

    Should the query /root/child also match the <child> element for the above two documents as it does for the original document in this example? The 3 documents shown [including the first example] are completely different documents and there is no consistent, standards compliant way to match against them using QNames in path expressions without explicitly pairing prefixes with namespaces.

    The only way to give people what they want in this case would be to come up with a proprietary version of XPath which was namespace agnostic. We do not plan to do this. However I do have a tip for developers showing how to reduce the amount of code it does take to write the examples. The following code does match the <child> element in all three documents and is fully conformant with the XPath 1.0 recommendation

    XmlDocument doc = new XmlDocument(); 
    doc.LoadXml("<root xmlns='http://www.example.com'><child /></root>"); 
    Console.WriteLine(doc.SelectSingleNode("/*[local-name()='root']/*[local-name()='child']").OuterXml);  

    Now on to the XmlValidatingReader issue. Assume we are given the following XML instance and schema

    <root xmlns="http://www.example.com">
     <child />
    </root>

    <xs:schema targetNamespace="http://www.example.com"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                elementFormDefault="qualified">
           
      <xs:element name="root">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="child" type="xs:string" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>

    </xs:schema>

    The instance document can be validated against the schema using the following code

    XmlTextReader tr = new XmlTextReader("example.xml");
    XmlValidatingReader vr = new XmlValidatingReader(tr);
    vr.Schemas.Add(null, "example.xsd");

    vr.ValidationType = ValidationType.Schema;
    vr.ValidationEventHandler += new ValidationEventHandler (ValidationHandler);

    while(vr.Read()){ /* do stuff or do nothing */  

    As you can see you do not need to know the target namespace of the schema to perform schema validation using the XmlValidatingReader. However many code samples in our SDK to specify the target namespace where I specified null above when adding schemas to the Schemas property of the XmlValidatingReader. When null is specified it indicates that the target namespace should be obtained from the schema. This would have been clearer if we'd had an overload for the Add() method which took only the schema but we didn't. Hindsight is 20/20.


     

    Categories: XML

    February 8, 2004
    @ 10:15 PM

    I noticed Gordon Weakliem reviewed ATOM.NET, an API for parsing and generating ATOM feeds. I went to the ATOM.NET website and decided to take a look at the ATOM.NET documentation. The following comments come from two perspectives, the first is as a developer who'll most likely have to implement something akin to ATOM.NET for RSS Bandit's internal workings and the other is from the perspective of being one of the folks at Microsoft whose job it is to design and critique XML-based APIs.

    • The AtomWriter class is superflous. The class that only has one method Write(AtomFeed) which makes more sense being on the AtomFeed class since an object should know how to persist itself. This is the model we followed with the XmlDocument class in the .NET Framework which has an overloaded Save() method. The AtomWriter class would be quite useful if it allowed you to perform schema driven generation of an AtomFeed, the same way the XmlWriter class in the .NET Framework is aimed at providing a convenient way to programmatically generate well-formed XML [although it comes close but doesn't fully do this in v1.0 & v1.1 of the .NET Framework]

    • I have the same feelings about the AtomReader class. This class also seems superflous. The functionality it provides is akin to the overloaded Load() method we have on the  XmlDocument class in the .NET Framework. I'd say it makes more sense and is more usable if this functionality was provided as a Load() method on an AtomFeed class than as a separate class unless the AtomReader class actually gets some more functionality.

    • There's no easy way to serialize an AtomEntry class as XML which means it'll be cumbersome using this ATOM.NET for the ATOM API since it requires sending  elements as XML over the wire. I use this functionality all the time in RSS Bandit internally from passing entries as XML for XSLT themes to the CommentAPI to IBlogExtension.

    • There is no consideration for how to expose extension elements and attributes in ATOM.NET. As far as I'm concerned this is a deal breaker that makes the ATOM.NET useless for aggregator authors since it means they can't handle extensions in ATOM feeds even though they may exist and have already started popping up in various feeds.


     

    Categories: XML

    February 8, 2004
    @ 09:37 PM

    Lots of people seem to like the newest version of RSS Bandit. The most recent praise was the following post by Matt Griffith

    I've been a Bloglines user for almost a year. I needed a portable aggregator because I use several different computers. Then a few months ago I got a TabletPC. Now portability isn't as critical since I always have my Tablet with me. I stayed with Bloglines though because none of the client-side aggregators I tried before worked for me.

    I just downloaded the latest version RSS Bandit. I love it. It is much more polished than it was the last time I tried it. Combine that with the dasBlog integration and the upcoming SIAM support and I'm in hog heaven. Thanks Dare, Torsten, and everyone else that helped make RssBandit what it is.

    Also it seems that at least one user liked RSS Bandit so much that he [or she] was motivated to write an article on Getting Started with RSS Bandit. Definitely a good starting point and something I wouldn't mind seeing become part of the official documentation once it's been edited and more details fleshed out.

    Sweet.


     

    Categories: RSS Bandit

    A few weeks ago during the follow up to the WinFX review of the System.Xml namespace of the .NET Framework it was pointed out that our team hadn't provided guidelines for exposing and manipulating XML data in applications. At first, I thought the person who brought this up was mistaken but after a cursory search I realized the closest thing that comes to such a set of guidelines is Don Box's MSDN TV episode entitled Passing XML Data Inside the CLR. As good as Don's discussion is, a video stream isn't as accessible as a written article. In tandem with coming up with some of the guidelines for utilizing XML in the .NET Framework for internal purposes I'll put together an article based on Don's MSDN TV episode with an eye towards the next version of the .NET Framework.

    If you watched Don's talk and had any questions about it or require any clarifications respond below so I can clarify them in the article I plan to write.


     

    Categories: XML

    February 8, 2004
    @ 08:59 PM

    Dave Winer is going to giving a talk at Microsoft Research tomorrow. Robert Scoble has is organizing a lunch before the talk with some folks at MSFT and Dave. I may or may not make it since my mom's visiting from Nigeria and I was planning to take most of the week off. Just in case, I miss it there is one thing I'd like Dave to know; most of the problems in the XML-based website syndication space could have been solved if he didn't act as if once he wrote a spec or code for the Radio Userland aggregator then it was impossible to change. Most of the supposed “problems” with RSS would take 30 minutes to fix in the spec and about a day to fix in the Radio Userland codebase (I'm making assumptions here based on how long it would take in the RSS Bandit codebase). Instead he stonewalled and now we have the ATOM mess. Of course, we'd still need something like the ATOM effort to bring the blogging APIs into the 21st century but we wouldn't have to deal with incompatibilities at the website syndication level as well.

     

    In a recent blog post Dave mentions that his MSR talk will mainly be about the themes from his article Howard Dean is not a soap bar. I don't really have an opinion on the content one way or the other but I did dislike the way he applies selective memory to prove a point specifically

    In the lead-up to the war in Iraq, for some reason, people who were against the war didn't speak.

    Maybe they didn't speak on the East coast but there was a very active anti-War movement on the West coast especially in the Seattle area. Actually they did speak out on the East Coast as well, in fact hundreds of thousands of voices all over the US and all over the world spoke out.

    It's makes me view the “blogs are the second coming” hype with suspicion when it's boosters play fast and loose with the facts to sell their vision.


     

    Categories: Life in the B0rg Cube