February 10, 2004
@ 05:30 PM

Robert Scoble has a post entitled Metadata without filling in forms? It's coming where he writes

Simon Fell read my interview about search trends and says "I still don't get it" about WinFS and metadata. He brings up a good point. If users are going to be forced to fill out metadata forms, like those currently in Office apps, they just won't do it. Fell is absolutely right.But, he assumed that metadata would need to be entered that way for every photo. Let's go a little deeper....OK, I have 7400 photos. I have quite a few of my son. So, let's say there's a new kind of application. It recognizes the faces automatically and puts a square around them. Prompting you to enter just a name. When you do the square changes color from red to green, or just disappears completely.
A roadblock to getting that done today is that no one in the industry can get along for enough time to make it possible to put metadata into files the way it needs to be done. Example: look at the social software guys. Friendster doesn't play well with Orkut which doesn't play well with MyWallop, which doesn't play well with Tribe, which doesn't play well with ICQ, which doesn't play well with Outlook. What's the solution? Fix the platform underneath so that developers can put these features in without working with other companies and/or other developers they don't usually work with.

The way WinFS is being pitched by Microsoft folks reminds me a lot of Hailstorm [which is probably unsurprising since a number of Hailstorm folks work on it] in that there are a lot of interesting and useful technical  ideas burdened by bad scenarios being hung on them. Before going into the the interesting and useful technical ideas around WinFS I'll start with why I consider the two scenarios mentioned by Scoble as “bad scenarios”.

The thought that if you make the file system a metadata store automatically makes search better is a dubious proposition to swallow when you realize that a number of the searches that people can't do today wouldn't be helped much by more metadata. This isn't to say some searches wouldn't work better (e.g. searching for songs by title or artist), however there are some search scenarios such as searching for a particular image or video from a bunch of image files with generic names or searching for a song by lyrics which simply having the ability to tag media types with metadata doesn't seem like enough. Once your scenarios start having to involve using “face recognition software” or “cameras with GPS coordinates” for a scenario to work then it is hard for people not to scoff. It's like a variation of the popular Slashdot joke

  1. Add metadata search capabilities to file system
  2. ???
  3. You can now search for “all pictures taken on Tommy's 5th birthday party at the Chuck E Cheese in Redmond”.

 with the ??? in the middle implying a significant dfficulty in going from step 1 to 3.

The other criticism is the fact that Robert's post implies that the reason applications can't talk to each other are technical. This is rarely the case. The main reasons applications don't talk to each other isn't a lack of technology [especially now that we have an well-defined format for exchanging data called XML] but for various social and business reasons. There are no technical reasons MSN Messenger can't talk to ICQ or which prevent Yahoo! Messenger from talking to AOL Instant Messenger. It isn't technical reasons that prevent my data in Orkut from being shared with Friendster or my book & music preferences in Amazon from being shared with other online stores I visit. All of these entities feel they have a competitive advantage in making it hard to migrate from their platforms.

The two things Microsoft needs to do in this space is are to (i) show how & why it is beneficial for different applications to share data locally and (ii) provide guidelines as well as best practices for applications to share data their data in a secure manner.

While talking to Joshua Allen, Dave Winer, Robert Scoble, Lili Cheng, and Curtis Wong yesterday it seemed clear to me that social software [or if you are a business user; groupware that is more individual-focused which gives people more control over content and information sharing] would be a very powerful and useful tool for businesses and end users if built on a platform like Longhorn with a smart data store that know how to create relationships between concepts as well as files (i.e. WinFS) and a flexible, cross platform distributed computing framework (i.e. Indigo).

The WinFS folks and Longhorn evangelists will probably keep focusing on what I have termed “bad scenarios” because they demo well but I suspect that there'll be difficulty getting traction with them in the real world. Of course, I may be wrong and the various people who've expressed incredulity at the current pitches are a vocal minority who'll be proved wrong once others embrace the vision. Either way, I plan to experiment with these ideas once Longhorn starts to beta and seeing where the code takes me.


Tuesday, February 10, 2004 8:34:24 PM (GMT Standard Time, UTC+00:00)

Your analogy reminds me of the classic Sidney Harris cartoon:


This cartoon was frequently displayed on the doors of my mathematics professors at university. I suppose they were trying to send the students a message. ;-)
Wednesday, February 11, 2004 2:48:28 AM (GMT Standard Time, UTC+00:00)
Very well written post, Dare. And you're expressing a lot of the thoughts I had after talking to Scobel. Maybe IFilter isn't dead after all.
Wednesday, February 11, 2004 6:57:29 PM (GMT Standard Time, UTC+00:00)
I think the thing that the "bad scenarios" demonstrate is that the problem was bounded by the form application itself. To simplify, the values entered are the "data". The fields in the schema are themselves a form of meta-data. The time/author of the change is both data and meta-data simultaneously.

Groove has been a platform that allows you to build these types of WinFS/Indigo applications for the past 3+ years. Guess what ... it's not like building client apps, server apps, or client/server apps. Data (and meta-data) change in ways (and have conflicts) that mere mortals often struggle with. And what if you want to integrate with existing server systems? When? How do you know that the union of all potential data changes that occured within a particular time window have been transacted [because with async mode one source of data/meta-data may be offline].

Users of the resulting apps -- well if they don't have knowledge that both data and meta-data are being replicated -- or for that matter even know what the heck it is we're talking about here. This is the design problem in my opinion.

So what do people do? They don't embrace the model ... but start using what they do know (i.e. dropping large, monolithic blobs such as a 100MB word document) into a files tool and wonder why Groove can't search it. Why, as if by magic, Groove wasn't able to only replicate the twenty paragraphs that changed on one particular endpoint, etc. Unfortunately, the finger is pointed at Groove. And will be pointed at Microsoft (eventually) when they (the user) runs Longhorn but uses an application that doesn't embrace WinFS/Indigo (not that the users might know).

It's a slippery slope here. I'm definitely going to continue my attempts at building these types of application. It's just hard -- but at least the support will come from the underlying OS.
Friday, February 13, 2004 5:25:11 PM (GMT Standard Time, UTC+00:00)

"would be a very powerful and useful tool for businesses and end users if built on a platform like Longhorn with a smart data store that know how to create relationships between concepts as well as files (i.e. WinFS) and a flexible, cross platform distributed computing framework (i.e. Indigo). "

I don't buy this idea. This reminds me an encyclopedia whose name I have forgotten. It would fill the screen with information overload. Every tiny little word, like 'a' and 'the' would be underlined and would like to something else making the underlying text very hard to read, and the actual message completely lost in the information overload. If you let a smart algorithm (I don't know why you invoke Longhorn here) build the relationships between items for you, then it'll eventually become the total mess I am describing.
Stephane Rodriguez
Friday, February 13, 2004 9:54:11 PM (GMT Standard Time, UTC+00:00)
the real reason applications dont talk to each other is because of the real pain in the arse it is to define and implement a "contract" between the two parties... in spite of the fact that XML is well formed... XML is still to free-form... if I like attributes for simple data and Joe likes elements always then we're still not communicating.

the real key is loose binding, and probably lots of converting and parsing and interfacing between the applications that we use.

adobe's album should be able to just expose the meta data, and office should be able to consume it... winfs is a good start to that type of system.

people didnt use office's meta data because frankly it wasmt on the beaten path. and frankly most people just simply arent organized and dont care to be organized, until they want to find something.

so then you have a catch-22, where the only way to really break the cycle is for the application to suggest meta data (requiring some sort of conceptual thinking AI) when the file is created, and possibly updated when its saved.

the trick is to stop trying to organize everything in ways that people dont understand. most people have a certain place or room that they toss stuff that they dont care to see all the time... frankly i do it myself, a box of knick knacks.

current file systems all fail the user experience because they force people to think hierarchily... and most of us just plain dont care.

winfs will begin to alleviate the force of hierarchial by simply becoming a "knick knacks box" for all the files that you create.

once we get to that point, then some people will begin to build their OWN organizing system that they feel comfortable with, or maybe it'll always be in generic boxes that never get organized but at least they'll know where to look...

hence the reason why most peoples' C:\ directories are just plain filled with files, photos etc.

winfs is gonna help us the developers recognize where the user's knick knacks boxes are and help users deal with what really should be occupying their time... accomplishing the goals and tasks that they want to complete when they sit down to their computer.
Saturday, February 14, 2004 1:03:16 AM (GMT Standard Time, UTC+00:00)
Stephane wrote
>If you let a smart algorithm (I don't know why you invoke Longhorn here) build the relationships between items for you, then it'll eventually become the total mess I am describing

But I didn't say anything about the smart algorithms automatically creating links between concepts and files. All I want is the store to be able to express things like a 'contact' who various files and parts of files [emails, blog posts, articles, etc] on my harddrive are related to.
Comments are closed.