A number of people have been riffing on the how “Web 2.0” is the new vendor lock-in. The week started with a post by Alex Iskold entitled Towards the Attention Economy: Will Attention Silos Ever Open Up? where he wrote

At a quick glance there maybe nothing wrong with the way things are today. For example, you can login to Amazon and see your order history, you can see what you rented on Netflix or what you bought on eBay. The problem is that the information is not readily portable and not readily available via a common interface. Because of this, managing your attention information is practically impossible.

Consider a different industry - banking. Each bank makes your recent financial transactions exportable in a few formats - pdf, comma separated, Excel, etc. An export in Excel is actually an interesting example, because it illustrates how your information can be leveraged. By exporting information from your bank and credit card into Excel you are able to take it to your financial adviser who can in turn analyze it. The point is that your financial information is portable.

On the other hand your Netflix rental history is not. You can argue that it is possible to copy and paste it out of Netflix, but the cost of doing this is prohibitive for individuals.

Of course, not every “Web 2.0” company is like Netflix and some do provide APIs for getting out your data. But I think Mark Pilgrim has a great point in his post Let’s not and say we did where he writes

Praising companies for providing APIs to get your own data out is like praising auto companies for not filling your airbags with gravel. I’m not saying data export isn’t important, it’s just aiming kinda low. You mean when I give you data, you’ll… give it back to me? People who think this is the pinnacle of freedom aren’t really worth listening to. Please, we need a Free Data movement. (Yeah I know, Tim predicted it already. I was the one who told him, at FOO Camp the month before.)

Back in the day, I thought Steve Gillmor’s AttentionTrust was a step in the direction of a Free Data movement but since then all I’ve seen out of that crowd was either irrelevant (e.g. XML formats that replace OPML blogrolls) or ill-thought out (e.g. attempting to create "business opportunities" by forming companies which act as middle men who resold your data to the Amazon's and Netflix's of the world, kinda like Microsoft's Hailstorm vision). I keep wondering if we’ll ever see this Free Data movement. However there is another problem we have to face even if a Free Data movement does take hold.

In a follow up post to the piece by Alex Iskold entitled Attention mashups, Dave Winer gets to the heart of the matter in his characteristic blunt style when he writes

But whose data is it??  Permalink to this paragraph

Seems it belongs to the users and they should be able to take it where they want. Sure Yahoo is providing a recommendation engine, that's nice (and thanks), but they also get to use my data for their own purposes. Seems like a fair trade. And I'm a paying customer of Netflix. They just lowered the price but I'd much rather have gotten a dividend in the form of being able to use my own data.  Permalink to this paragraph

Think of the mashups that would be possible.  Permalink to this paragraph

Wouldn't it be great to link up Match.com with movie ratings to find dates that like the same movies?

One of the bitter truths about "Web 2.0" is that your data isn't all that interesting, our data on the other hand is very interesting. Dave Winer’s mash up example isn’t interesting because he wants to be able to get his data out of Netflix but because he wants to be able to combine his data with every body else’s data. This is where our “potential” Free Data movement will run into problems. The first is that a lot of “Web 2.0” websites provide value to their users via wisdom of the crowds appproaches such as tagging or  recommendations which are simply not possible with a single user’s data set or with a small set of users. This leads to a tendency for the rich to get richer because since they have the most data they provide the most value for end users (e.g. Amazon). Another problem is that social software leads to lock-in.  My buddy list on Windows Live Messenger and my list of friends in Facebook are useless to me outside the context of these applications. Although I can get all of my history and data out of these services, I lose the value I get from the fact that all my friends use these services as well. Again, my data isn’t what is interesting.

Being able to get your data out via APIs is a good first step but what is really interesting is being able to get everyone else’s data out of the service as well. Then we would have the beginings of truly open and free data on the Web which would lead to very, very interesting possibilities. 

Now playing: Rick Ross - Hustlin' (remix) (feat. Jay-Z & Young Jeezy)



Friday, August 3, 2007 9:55:34 PM (GMT Daylight Time, UTC+01:00)
>>what is really interesting is being able to get everyone else's data out of the service as well.
This will have privacy implications. AOL tried to share their search history and ended up in a disaster. This may discourage companies providing users data.
Friday, August 3, 2007 10:28:24 PM (GMT Daylight Time, UTC+01:00)
I agree that overall that might eventually happen. And this is great for companies like Netflix. But it should be our choice who to trust with our information.

Saturday, August 4, 2007 3:07:01 AM (GMT Daylight Time, UTC+01:00)
This made me think about how open source (or rather - the principles behind open source) work in a world where more and more stuff happens in the cloud.

See http://www.sriramkrishnan.com/blog/2007/08/open-source-and-scratching-itches-in.html
Saturday, August 4, 2007 3:26:38 PM (GMT Daylight Time, UTC+01:00)
Somewhat related, people as OPML to further the cause of open social networks:

Saturday, August 4, 2007 5:29:07 PM (GMT Daylight Time, UTC+01:00)

I concur with your general points of view re. "Openness" but please assure me that you do not find these insights incompatible with the very notion that underlies Semantic Data Web vision: a Web of openly accessible interlinked Data (as per http://linkeddata.org).

I have written about Data Openness for eons (internet time), and there are several blog post/demos I've put out (including one that frees up Facebook profile and album data) to demonstrates these points.

Every point of presence on the web is ultimately a Semantic Data Web Data Source. The main stumbling blocks being: at what cost?

I have spent a lot of time and energy ensuring that there isn't an "RDF Tax". By this I mean: RDF should be generated on the fly from: (X)HTML (GRDDL and other means), Web Services (see our RDF Sponger Middleware and Dave Beckett's Triplr), and traditional enterprise databases (de-normalization via SQL to RDF mapping; basically making the Conceptual Model real as per ADO.vNext).
Sunday, August 5, 2007 6:51:01 PM (GMT Daylight Time, UTC+01:00)
If Microsoft really cared about the users like Apple or wanted to open up the data generally, they'd put some pressure where it matters?


Take a page from what Apple did by offering a tool that allows to select sections of web pages and then turn that selection into a script that automates the screenscraping of the data into some xml format for mashup/gadget or other consumption.

Obviously if this was easy enough for the simpler scenarios of the kind that Apple demoed in the keynote and directly from IE, it would pressure companies either to constantly change their layouts or provide their own gadgets and APIs.

How to give the incentive+pressure to change toward providing APIs along with html?

Instead of everytime directly scraping from the HTML, the scrape engine would on the first scrape/selection both scrape the html for the data and by creating a hash from static surrounding of the data (eg CSS attributes, class, table location etc) use the hash to query for a pure xml output of the data.

Now if the company providing the data is smart, they'll take a note of what data is requested in HTML form associated with this hash. Now they simply create new page that answers to this hash based query and gives the data in xml form. At this point the scrape engine will start using the XML output and not requesting full html page for scraping and the company is free to modify the appearance and layout of the HTML while the gadgets using the hash still keep working assuming company keeps it available (and they would since otherwise they'd face millions of these scrape engines DDOSing their server for that data constantly in HTML form).

So there, if Microsoft wants to make web sites open their data, they can by incorporating this feature in next IE.
Sunday, August 5, 2007 6:59:52 PM (GMT Daylight Time, UTC+01:00)
And if you don't think that'd work - believe me it will work:

I've seen this happen quite a few times. First an user creates a program to make use of the company data, then many users adopt this program and suddenly they're bombarded for the data from the application and realize there's a need this and the current method eats a lot of bandwidth and resources and they end up offering some more lean xml output of the data.
Comments are closed.