Metadata Quality, Events Databases and Live Clipboard

April 4, 2006

@ 02:18 AM

In his post Exploring Live Clipboard Jon Udell posts a screencast he made about LiveClipboard. He writes

I've been experimenting with microformats since before they were called that, and I'm completely jazzed about Live Clipboard. In this screencast I'll walk you through examples of Live Clipboard in use, show how the hCalendar payload is wrapped, grab hCalendar data from Upcoming and Eventful, convert it to iCalendar format for insertion into a calendar program, inject it natively into Live Clipboard, and look at Upcoming and Eventful APIs side-by-side.
All this leads up to a question: How can I copy an event from one of these services and paste it into another? My conclusion is that adopting Live Clipboard and microformats will be necessary but not sufficient. We'll also need a way to agree that, for example, this venue is the same as that venue. At the end, I float an idea about how we might work toward such agreements.

The problem that Jon Udell describes is a classic problem when dealing with mapping data from different domains. I posted about this a few months ago in my post Metadata Quality and Mapping Between Domain Languages where I wrote

The problem Stefano has pointed out is that just being able to say that two items are semantically identical (i.e. an artist field in dataset A is the same as the 'band name' field in dataset B) doesn't mean you won't have to do some syntactic mapping as well (i.e. alter artist names of the form "ArtistName, The" to "The ArtistName") if you want an accurate mapping.

This is the big problem with data mapping. In Jon's example, the location is called Colonial Theater in Upcoming and Colonial Theater (New Hampshire) in Eventful. In Eventful it has a street address while in Upcoming only the street name is provided. Little differences like these are what makes data mapping a hard problem. Jon's solution is for the community to come up with global identifiers for venues as tags (e.g. Colonial_Theater_NH_03431) instead of waiting for technologists to come up with a solution. That's good advice because there really isn't a good technological solution for this problem. Even RDF/Semantic Web junkies like Danny Ayers in posts like Live clipboard and identifying things start with assumptions like every venue has a unique identifier which is it's URI. Of course this ignores the fact that coming up with a global, unique identification scheme for the Web is the problem in the first case. The problem with Jon's approach is the same one that is pointed out in almost every critique of folksonomies, people won't use the same tags for the same concept. Jon might useColonial_Theater_NH_03431 while I use Colonial_Theater_95_Maine_Street_NH_03431 which leaves us with the same problem of inconsistent identifiers being used for the same venue.

I assume that for the near future we continue seeing custom code being written to make data integration across domains work. Unfortunately, no developments on the horizon look promising in making this problem go away.

PS: Ray Ozzie has a post on some of the recent developments in the world of Live Clipboard in his post Wiring Progress, check it out.

Categories: Technology | Web Development

Tracked by:
"Mapping Data Between Domains : Are We Trying Too Hard, And Simply Overlooking T... [Trackback]
"Shared Hosting Provider" (Shared Hosting Provider) [Trackback]

« Photo E-mail and Windows Live Mail Deskt... | Home | Greg Linden on SQL Databases and Interne... »

Tuesday, 04 April 2006 04:51:25 (GMT Daylight Time, UTC+01:00)

"People won't use the same tags for the same concept."

Most people won't go to the trouble of doing this either:

http://en.wikipedia.org/wiki/Abelson

But a few will.

That link goes to a disambiguation page that a handful of folks have chosen to maintain.

I'm merely suggesting it's conceivable that in a similar fashion, a few folks (for many topics) will find it interesting or useful to apply the reverse operation -- e.g., by linking things related to this Hal Abelson:

http://en.wikipedia.org/wiki/Hal_Abelson

back to that URI, or alternatively by threading them together with a tag that they collectively agree will represent Hal Abelson the computer scientist as opposed to all other computer scientists.

It's a stretch, I'll admit, but stranger things have happened.

Jon Udell

Tuesday, 04 April 2006 11:17:26 (GMT Daylight Time, UTC+01:00)

Grrrr. I didn't "...start with assumptions like every venue has a unique identifier which is it's URI." For the first half of my post I talked about identifying things by description, where things *don't* have a URI.

"Of course this ignores the fact that coming up with a global, unique identification scheme for the Web is the problem in the first case." The Web already has a global unique identification scheme, why is why I concluded that it'd probably make pragmatic sense to use it (URIs) to identify things like events.

"Unfortunately, no developments on the horizon look promising in making this problem go away.". Try looking at a different horizon... Semantic Web technologies have been addressing exactly these issues head-on for a few years now, many of the problems have largely been solved. If you want to integrate heterogenous data on the Webyou should check out RDF and OWL, that's essentially what they've been designed for. All I'm saying ;-)

Danny

Tuesday, 04 April 2006 15:28:17 (GMT Daylight Time, UTC+01:00)

Another group that has been addressing this problem, is the Topic Map community.

Take a look at:
http://www.mondeca.com/lab/bernard/hubjects.pdf

Guy

Tuesday, 04 April 2006 17:37:29 (GMT Daylight Time, UTC+01:00)

Jon,
I hadn't considered the wikipedia disambiguation model being used to solve the problem from a bottom up perspective. It could work, stranger things have happened. :)

Danny,
I'd be interested in how one could use RDF and OWL as exist today to support the real-world use case of cutting and pasting events between Upcoming and Eventful. A follow up blog post from you on the topic would be educational for me and I suspect a lot of others as well.

Dare Obasanjo

Tuesday, 04 April 2006 17:58:58 (GMT Daylight Time, UTC+01:00)

What if every thing was put on a real map. A real world map. All related events fall together.

The problem can be solved using User Interfaces that are mapped to real world paradigms.
Maps, calendars, people.

Vishi Gondi

Tuesday, 04 April 2006 18:00:48 (GMT Daylight Time, UTC+01:00)

But another problem comes up, which map to use, google maps, yahoo maps or live maps.

What we really need is a standard way to put stuff on these maps.

Vishi Gondi

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Metadata Quality, Events Databases and Live Clipboard - Dare Obasanjo's weblog