October 19, 2008
@ 08:47 AM

Tim Bray has a thought provoking post on embracing cloud computing entitled Get In the Cloud where he brings up the problem of vendor lock-in. He writes

Tech Issue · But there are two problems. The small problem is that we haven’t quite figured out the architectural sweet spot for cloud platforms. Is it Amazon’s EC2/S3 “Naked virtual whitebox” model? Is it a Platform-as-a-service flavor like Google App Engine? We just don’t know yet; stay tuned.

Big Issue · I mean a really big issue: if cloud computing is going to take off, it absolutely, totally, must be lockin-free. What that means if that I’m deploying my app on Vendor X’s platform, there have to be other vendors Y and Z such that I can pull my app and its data off X and it’ll all run with minimal tweaks on either Y or Z.

At the moment, I don’t think either the Amazon or Google offerings qualify.

Are we so deranged here in the twenty-first century that we’re going to re-enact, wide-eyed, the twin tragedies of the great desktop-suite lock-in and the great proprietary-SQL lock-in? You know, the ones where you give a platform vendor control over your IT budget? Gimme a break.

I’m simply not interested in any cloud offering at any level unless it offers zero barrier-to-exit.

Tim's post is about cloud platforms but I think it is useful to talk about avoiding lock-in when taking a bet on cloud based applications as well as when embracing cloud based platforms. This is especially true when you consider that moving from one application to another is a similar yet smaller scoped problem compared to moving from one Web development platform to another.

So let's say your organization wants to move from a cloud based office suite like Google Apps for Business to Zoho. The first question you have to ask yourself is whether it is possible to extract all of your organization's data from one service and import it without data loss into another. For business documents this should be straightforward thanks to standards like ODF and OOXML. However there are points to consider such as whether there is an automated way to perform such bulk imports and exports or whether individuals have to manually export and/or import their online documents to these standard formats. Thus the second question is how expensive it is for your organization to move the data. The cost includes everything from the potential organizational downtime to account for switching services to the actual IT department cost of moving all the data. At this point, you then have to weigh the impact of all the links and references to your organization's data that will be broken by your switch. I don't just mean links to documents returning 404 because you have switched from being hosted at google.com to zoho.com but more insidious problems like the broken experience of anyone who is using the calendar or document sharing feature of the service to give specific people access to their data. Also you have to ensure that email that is sent to your organization after the switch goes to the right place. Making this aspect of the transition smooth will likely be the most difficult part of the migration since it requires more control over application resources than application service providers typically give their customers. Finally, you will have to evaluate which features you will lose by switching applications and ensure that none of them is mission critical to your business.

Despite all of these concerns, switching hosted application providers is mostly a tractable problem. Standard data formats make data migration feasible although it might be unwieldy to extract the data from the service. In addition, Internet technologies like SMTP and HTTP all have built in ways to handle forwarding/redirecting references so that they aren't broken. However although the technology makes it possible, the majority of hosted application providers fall far short of making it easy to fully migrate to or away from their service without significant effort.

When it comes to cloud computing platforms, you have all of the same problems described above and a few extra ones. The key wrinkle with cloud computing platforms is that there is no standardization of the APIs and platform technologies that underlie these services. The APIs provided by Amazon's cloud computing platform (EC2/S3/EBS/etc) are radically different from those provided by Google App Engine (Datastore API/Python runtime/Images API/etc). For zero lock-in to occur in this space, there need to be multiple providers of the same underlying APIs. Otherwise, migrating between cloud computing platforms will be more like switching your application from Ruby on Rails and MySQL to Django and PostgreSQL (i.e. a complete rewrite).

In response to Tim Bray's post, Dewitt Clinton of Google left a comment which is excerpted below

That's why I asked -- you can already do that in both the case of Amazon's services and App Engine. Sure, in the case of EC2 and S3 you'll need to find a new place to host the image and a new backend for the data, but Amazon isn't trying to stop you from doing that. (Actually not sure about the AMI format licensing, but I assumed it was supposed to be open.)

In App Engine's case people can run the open source userland stack (which exposes the API you code to) on other providers any time the want, and there are plenty of open source bigtable implementations to chose from. Granted, bulk export of data is still a bit of a manual process, but it is doable even today and we're working to make it even easier.

Ae you saying that lock-in is avoided only once the alternative hosts exist?

But how does Amazon or Google facilitate that, beyond getting licensing correct and open sourcing as much code as they can? Obviously we can't be the ones setting up the alternative instances. (Though we can cheer for them, like we did when we saw the App Engine API implemented on top of EC2 and S3.)

To Doug Cutting's very good point, the way Amazon and Google (and everyone else in this space) seem to be trying to compete is by offering the best value, in terms of reliability (uptime, replication) and performance (data locality, latency, etc) and monitoring and management tools. Which is as it should be.

Although Dewitt is correct that Google and Amazon are not explicitly trying to lock-in customers to their platform, the fact is that today if a customer has heavily invested in either platform then there isn't a straightforward way for customers to extricate themselves from the platform and switch to another vendor. In addition there is not a competitive marketplace of vendors providing standard/interoperable platforms as there are with email hosting or Web hosting providers.

As long as these conditions remain the same, it may be that lock-in is too strong a word describe the situation but it is clear that the options facing adopters of cloud computing platforms aren't great when it comes to vendor choice.

Note Now Playing: Britney Spears - Womanizer Note