I've been thinking a little bit about Google Gears recently and after reading the documentation things I've realized that making a Web-based application that works well offline poses an interesting set of challenges. First of all, let's go over what constitutes the platform that is Google Gears. It consists of three components

  • LocalServer: Allows you to cache and serve application resources such as HTML pages, scripts, stylesheets and images from a local web server. 

  • Database: A relational database where the application can store data locally. The database supports both full-text and SQL queries.

  • WorkerPool: Allows applications to perform I/O expensive tasks in the background and thus not lock up the browser. A necessary evil. 

At first, this seemed like a lot to functionality being offered by Google Gears until I started trying to design how I'd take some of my favorite Web applications offline. Let's start with a straightforward case such as Google Reader. The first thing you have to do is decide what data needs to be stored locally when the user decides to go offline. Well, a desktop RSS reader has all my unread items even when I go offline so a user may expect that if they go offline in Google Reader this means all their unread items are offline. This could potentially be a lot of data to transfer in the split instant between when the user selects "go offline" in the Google Reader interface and she actually loses her 'net connection by closing her laptop. There are ways to work around this such as limiting how many feeds are available offline (e.g. Robert Scoble with a thousand feeds in his subscription list won't get to take all of them offline) or by progressively downloading all the unread content while the user is viewing the content in online mode. Let's ignore that problem for now because it isn't that interesting.

The next problem is to decide which state changes while the app is offline need to be reported back when the user gets back online. These seem to be quite straightforward,

  • Feed changed
    • Feed added
    • Feed deleted
    • Feed renamed
    • Feed moved
  • News item changed
    • Item marked read/unread
    • Item flagged/starred
    • Item tag updated

The application code can store these changes as a sequential list of modifications which are then executed whenever the user gets back online. Sounds easy enough. Or is it?

What happens if I'm on my laptop and I go offline in Google Reader and mark a bunch of stuff as read then unsubscribe from a few feeds I no longer find interesting. The next day when I get to work, I go online on my desktop, read some new items and subscribe to some new feeds. Later that day, I go online with my laptop. Now the state on my laptop is inconsistent from that on the Web server. How do we reconcile these differences?

The developers at Google have anticipated these questions and have answered them in Google Gears documentation topic titled Choosing an Offline Application Architecture which states

No matter which connection and modality strategy you use, the data in the local database will get out of sync with the server data. For example, local data and server data get out of sync when:

  • The user makes changes while offline
  • Data is shared and can be changed by external parties
  • Data comes from an external source, such as a feed

Resolving these differences so that the two stores are the same is called "synchronization". There are many approaches to synchronization and none are perfect for all situations. The solution you ultimately choose will likely be highly customized to your particular application.

Below are some general synchronization strategies.

Manual Sync

The simplest solution to synchronization is what we call "manual sync". It's manual because the user decides when to synchronize. It can be implemented simply by uploading all the old local data to the server, and then downloading a fresh copy from the server before going offline.
...

Background Sync

In a "background sync", the application continuously synchronizes the data between the local data store and the server. This can be implemented by pinging the server every once in a while or better yet, letting the server push or stream data to the client (this is called Comet in the Ajax lingo).

I don't consider myself some sort of expert on data synchronization protocols but it seems to me that there is a lot more to figuring out a data synchronization strategy than whether it should be done based on user action or automatically in the background without user intervention. It seems that there would be all sorts of decisions around consistency models and single vs. multi-master designs that developers would have to make as well. And that's just for a fairly straightforward application like Google Reader. Can you imagine what it would be like to use Google Gears to replicate the functionality of Outlook in the offline mode of Gmail or to make Google Docs & Spreadsheets behave properly when presented with conflicting versions of a document or spreadsheet because the user updated it from the Web and in offline mode?  

It seems that without providing data synchronization out of the box, Google Gears leaves the most difficult and cumbersome aspect of building a disconnected Web app up to application developers. This may be OK for Google developers using Google Gears since the average Google coder is a Ph.D but the platform isn't terribly useful to Web application developers who want to use it for anything besides a super-sized HTTP cookie. 

A number of other bloggers such as Roger Jennings and Tim Anderson have also pointed that the lack of data synchronization in Google Gears is a significant oversight. If Google intends for Google Gears to become a platform that will be generally useful to the average Web developer then the company will have to fix this oversight. Otherwise, they haven't done as much for the Web development world as the initial hype led us to believe. 


 

Wednesday, 06 June 2007 05:16:01 (GMT Daylight Time, UTC+01:00)
I personally wouldn't hasten to pass judgement on the lack of synchronization solutions in Gears today. It is an early Beta after all, and the omission may have everything to do with time and resources.

Synchronization is a very tough problem. It is possible to solve it though, as several desktop apps have demonstrated. And Google has some of the best minds in the business to solve this. However applications enjoy the advantages of specificity in solutions, whereas platforms require a general purpose solution. As Google moves into the "uncharted waters" being more of a platforms provider, it will have to encounter and solve these difficult problems that folks in Microsoft have been dealing with and fixing for years.

It is also significant that Gears (like Picasa) is an exception to Google's ethic of not repeating Microsoft's mistakes [1] with deployed client software.

1: http://mark-lucovsky.blogspot.com/2005/02/shipping-software.html
Wednesday, 06 June 2007 05:50:36 (GMT Daylight Time, UTC+01:00)
If Google Gears came out saying "we offer you one button to offline enable your apps" that would be one thing.

In fact, the opposite has been true. If you listen to the Google Gears engineers they have been very careful to say that they are going for baby steps here.

It would be incredible balsy to come out and say "ok, we have solved the sync issues and here is how you will do it". It is a hard, if not impossible problem to generisize. I think that we need to play with solutions to particular use cases and then come up with some practices. Things that worked, things that didn't, and maybe then we will Gears itself be able to offer more.

Noone has said that making apps offline is simple :)
Wednesday, 06 June 2007 06:55:58 (GMT Daylight Time, UTC+01:00)
As others have noted, synchronization that works in the general case is incredibly hard to do (Lotus Notes provides one one, for example, and the joke goes that in case of a Replication Conflict you can count on Notes to do the wrong thing).

Instead of creating an overcomplicated one-size-fits-all approach, maybe applications should choose their own sync strategies? This doesn't mean they have to write it from scratch, we could see plug-in sync libraries in future, along with patterns that describe how to prep your datastore to store replication conflicts and how to let your UI convey conflicts to the user.

Wednesday, 06 June 2007 07:13:14 (GMT Daylight Time, UTC+01:00)

I view GoogleGears as a competitor to a versioning program (i.e. Perforce, ...)

Synching Google docs and spreadsheets can be seen as involving the usual 3-way merge. I am not sure what's so fundamentally hard when you think about it. It's not like this problem has not been solved. It's not like Google Gears makes it harder than without.

Remember, this stuff is a generic library at this point.

Wednesday, 06 June 2007 08:49:53 (GMT Daylight Time, UTC+01:00)
At the end of the day Google Gears is not that big a deal - they have taken a relational database, a local web server and a threadpool and wrapped them up into a browser plugin which can be used by developers. Surely it is up to the developers who program against this plugin/api to decide how their synchronization strategy will work and not google?
Wednesday, 06 June 2007 10:39:59 (GMT Daylight Time, UTC+01:00)
Dare for now getting offline capability with a very simple one line script call is enough for most people. Look at Remember the Milk who have just implemented Gears. I guess the sync issue is the reason why we don't have Gmail and GCal but as soon as we do Office Bloatware and pricing will go
Wednesday, 06 June 2007 11:18:40 (GMT Daylight Time, UTC+01:00)
I wonder would there not be a restful solution to synchronisation for Google Gears. In theory, since all of the http requests made to the local server are stateless, they can be saved in the database and replayed to the server when the app is connected again. In practice...
Keith
Wednesday, 06 June 2007 15:26:35 (GMT Daylight Time, UTC+01:00)
As other said, Google goal is providing an API to build offline applications. Your view -I guess- is related to be accustomed to Microsoft approach of delivering huge pieces of software even for Betas or CTPs. Google is more agile in this sense, and they always release tiny bits of functionality and then build up quite quickly based on user feedback instead of lab tests.

I like Microsoft becoming more iterative, but I see they are still much lab-type.

I really appreciate Google approach. My gut feeling is that they'll keep releasing tiny synchronization features for developers to tackle different scenarios. Remember replication is one of the most complex problems in the industry. How many books key books on database replication/synchronization do you now? I mean on this specific topic?

In 25 years, I participated in dozen of projects with heavy synch issues, and although some patterns reappear, every solution has its different edges.

A very cool problem for us geeks-at-heart!




Wednesday, 06 June 2007 15:49:23 (GMT Daylight Time, UTC+01:00)
Updated http://oakleafblog.blogspot.com/2007/06/google-gears-piques-new-interest-in.html for this post and RememberTheMilk
Wednesday, 06 June 2007 16:59:02 (GMT Daylight Time, UTC+01:00)
Brent Simmons has a couple of great posts about how hard synchronization of news feeds is in reality.

http://inessential.com/?comments=1&postid=2760

http://inessential.com/?comments=1&postid=3307

He essentially worked on the same problem that Google is, synching an offline news reader with an online one.
Wednesday, 06 June 2007 19:09:16 (GMT Daylight Time, UTC+01:00)
Maybe I don't know enough about data synchronization to ask this question myself, but how can one design a synchronization strategy that fits every application? A synchronization strategy *must* be driven by the data fidelity requirements of the application using it.

Let's say you have this problem: a machine that was previously offline with some changes to a stale copy now comes online -- what is the right thing to do?

If it's a forum application, write() really means 'append' a post to some discussion thread -- so maybe you can update the local cache from the server, and append any queued writes to the end of the discussion thread.

For a document editing app, we should detect that a change was made since we went offline, and pop up a merging dialog so the user can ensure the correctness of all his document changes (merging queued writes with writes from his other online session).

The best thing Gears (or any library) can do is provide built in versioning, and allow users to write plug-ins are invoked when versioning mismatches occur. Ultimately, only the *application writer* knows what the abstraction the data encapsulates, and consequently, what it means to sync it coherently.
Thursday, 07 June 2007 01:09:04 (GMT Daylight Time, UTC+01:00)
As others have pointed out, solving synchronization as a general problem is quite hard, definitely in the research realm.

However, there are examples of people who have done synchronization right, or at least close to right. The most notable of those is Ray Ozzie. But as I understand it both Notes and Groove make synch manageable by making some assumptions about the forms that data can take that are not as general as a true db. Could Google Gears (or a competing abstraction) make similar assumptions and come up with a working in-browser synch scheme? Maybe, but it won't be easy.

In the meantime, I think it quite likely that people will start delivering higher level abstractions on top of Google Gears to address particular sets of problem domains. Since Gears itself is open source, most of these abstractions will also be open source. Its going to be very tough for a competing persistence layer to win out.

Bottom line: Microsoft should swallow its pride/NIH and support Google Gears directly in IE.
Thursday, 07 June 2007 01:11:46 (GMT Daylight Time, UTC+01:00)
Oh, and by the way, many of the problems you point out are also a problem today when syncing between mobile and pc, or when syncing between two PCs. Groove is good, but Microsoft's more general file synch stuff blows big chunks...
Thursday, 07 June 2007 02:38:31 (GMT Daylight Time, UTC+01:00)
I'm far from being an expert on synchronization, but wouldn't a properly timestamped journal solve this problem (at least as far as Google Reader is concerned).

When the user syncs up with the server, just replay all the actions taken in chronological order, thus becoming as closely synced as possible.

I realize this is probably not the ideal solution for every case, but would probably due for most non-financial/critical applications.
Saturday, 09 June 2007 17:55:25 (GMT Daylight Time, UTC+01:00)
My previous comment was lost somehow.
Yes, sync in general is hard. But Google Reader is probably one of the easier Google apps to add offline support to (timestamps are probably enough to resolve conflicts).

In any case, Google Gears lets you focus on the harder problem, getting the more trivial stuff out of the way. Also, by getting more people thinking about the problem, maybe some solutions will emerge.
The more likely outcome though is that there will not be a completely generic sync solution, but rather that some patterns will emerge. I'm especially hopeful to see what the "offline community" can learn from the collaborative editing folks (SubEthaEdit, source control, Groove).
The undo/redo pattern of queuing change operations seems the cleanest approach so far, although you are right that it still requires lots of custom code.
Thursday, 14 June 2007 13:02:52 (GMT Daylight Time, UTC+01:00)
Brent Simmons has a couple of great posts about how hard synchronization of news feeds is in reality.
Good article and site. Congratulations.
Comments are closed.