I've been thinking a little bit about Google Gears recently and after reading the documentation things I've realized that making a Web-based application that works well offline poses an interesting set of challenges. First of all, let's go over what constitutes the platform that is Google Gears. It consists of three components

  • LocalServer: Allows you to cache and serve application resources such as HTML pages, scripts, stylesheets and images from a local web server. 

  • Database: A relational database where the application can store data locally. The database supports both full-text and SQL queries.

  • WorkerPool: Allows applications to perform I/O expensive tasks in the background and thus not lock up the browser. A necessary evil. 

At first, this seemed like a lot to functionality being offered by Google Gears until I started trying to design how I'd take some of my favorite Web applications offline. Let's start with a straightforward case such as Google Reader. The first thing you have to do is decide what data needs to be stored locally when the user decides to go offline. Well, a desktop RSS reader has all my unread items even when I go offline so a user may expect that if they go offline in Google Reader this means all their unread items are offline. This could potentially be a lot of data to transfer in the split instant between when the user selects "go offline" in the Google Reader interface and she actually loses her 'net connection by closing her laptop. There are ways to work around this such as limiting how many feeds are available offline (e.g. Robert Scoble with a thousand feeds in his subscription list won't get to take all of them offline) or by progressively downloading all the unread content while the user is viewing the content in online mode. Let's ignore that problem for now because it isn't that interesting.

The next problem is to decide which state changes while the app is offline need to be reported back when the user gets back online. These seem to be quite straightforward,

  • Feed changed
    • Feed added
    • Feed deleted
    • Feed renamed
    • Feed moved
  • News item changed
    • Item marked read/unread
    • Item flagged/starred
    • Item tag updated

The application code can store these changes as a sequential list of modifications which are then executed whenever the user gets back online. Sounds easy enough. Or is it?

What happens if I'm on my laptop and I go offline in Google Reader and mark a bunch of stuff as read then unsubscribe from a few feeds I no longer find interesting. The next day when I get to work, I go online on my desktop, read some new items and subscribe to some new feeds. Later that day, I go online with my laptop. Now the state on my laptop is inconsistent from that on the Web server. How do we reconcile these differences?

The developers at Google have anticipated these questions and have answered them in Google Gears documentation topic titled Choosing an Offline Application Architecture which states

No matter which connection and modality strategy you use, the data in the local database will get out of sync with the server data. For example, local data and server data get out of sync when:

  • The user makes changes while offline
  • Data is shared and can be changed by external parties
  • Data comes from an external source, such as a feed

Resolving these differences so that the two stores are the same is called "synchronization". There are many approaches to synchronization and none are perfect for all situations. The solution you ultimately choose will likely be highly customized to your particular application.

Below are some general synchronization strategies.

Manual Sync

The simplest solution to synchronization is what we call "manual sync". It's manual because the user decides when to synchronize. It can be implemented simply by uploading all the old local data to the server, and then downloading a fresh copy from the server before going offline.

Background Sync

In a "background sync", the application continuously synchronizes the data between the local data store and the server. This can be implemented by pinging the server every once in a while or better yet, letting the server push or stream data to the client (this is called Comet in the Ajax lingo).

I don't consider myself some sort of expert on data synchronization protocols but it seems to me that there is a lot more to figuring out a data synchronization strategy than whether it should be done based on user action or automatically in the background without user intervention. It seems that there would be all sorts of decisions around consistency models and single vs. multi-master designs that developers would have to make as well. And that's just for a fairly straightforward application like Google Reader. Can you imagine what it would be like to use Google Gears to replicate the functionality of Outlook in the offline mode of Gmail or to make Google Docs & Spreadsheets behave properly when presented with conflicting versions of a document or spreadsheet because the user updated it from the Web and in offline mode?  

It seems that without providing data synchronization out of the box, Google Gears leaves the most difficult and cumbersome aspect of building a disconnected Web app up to application developers. This may be OK for Google developers using Google Gears since the average Google coder is a Ph.D but the platform isn't terribly useful to Web application developers who want to use it for anything besides a super-sized HTTP cookie. 

A number of other bloggers such as Roger Jennings and Tim Anderson have also pointed that the lack of data synchronization in Google Gears is a significant oversight. If Google intends for Google Gears to become a platform that will be generally useful to the average Web developer then the company will have to fix this oversight. Otherwise, they haven't done as much for the Web development world as the initial hype led us to believe.