Jenna has uploaded pictures from our wedding weekend in Las Vegas and our honeymoon in Puerto Vallarta to her Windows Live space. Below are a couple of entry points into the photo stream.

Signing the marriage licence

The blushing bride

On the way to the after party

One of the many amazing sunsets we saw in Mexico

We got to release baby turtles into the ocean

Nothing beats a pool with a beach view

These are the pictures that we took ourselves. The pics from the professionals capturing the wedding day and reception will show up in a couple of weeks. 

Now playing: Jagged Edge - Let's Get Married (remix) (feat. Run DMC)


 

Categories: Personal

Werner Vogels, CTO of Amazon, has a blog post entitled Amazon's Dynamo which contains the HTML version of an upcoming paper entitled Dynamo: Amazon’s Highly Available Key-value Store which describes a highly available, distributed storage system used internally at Amazon. 

The paper is an interesting read and a welcome addition to the body of knowledge about building megascale distributed storage systems. I particularly like that it isn’t simply another GFS or BigTable, but is unique in it’s own right. Hopefully, this will convince folks that just because Google were first to publish papers about their internal infrastructure doesn’t mean that what they’ve done is the bible of building megascale distributed systems. Anyway, on to some of the juicy bits

Traditionally production systems store their state in relational databases. For many of the more common usage patterns of state persistence, however, a relational database is a solution that is far from ideal. Most of these services only store and retrieve data by primary key and do not require the complex querying and management functionality offered by an RDBMS. This excess functionality requires expensive hardware and highly skilled personnel for its operation, making it a very inefficient solution. In addition, the available replication technologies are limited and typically choose consistency over availability. Although many advances have been made in the recent years, it is still not easy to scale-out databases or use smart partitioning schemes for load balancing.

Although I work for a company that sells a relational database product, I think it is still fair to say that there is a certain level of scale where practically every feature traditionally associated with an RDBMS works against you.

Luckily, there are only a handful of companies and Web services in the world that need to operate at that scale.

2.1 System Assumptions and Requirements

The storage system for this class of services has the following requirements:

Query Model: simple read and write operations to a data item that is uniquely identified by a key. State is stored as binary objects (i.e., blobs) identified by unique keys. No operations span multiple data items and there is no need for relational schema. This requirement is based on the observation that a significant portion of Amazon’s services can work with this simple query model and do not need any relational schema. Dynamo targets applications that need to store objects that are relatively small (usually less than 1 MB).

ACID Properties: ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. Experience at Amazon has shown that data stores that provide ACID guarantees tend to have poor availability. This has been widely acknowledged by both the industry and academia [5]. Dynamo targets applications that operate with weaker consistency (the “C” in ACID) if this results in high availability. Dynamo does not provide any isolation guarantees and permits only single key updates.

Other Assumptions: Dynamo is used only by Amazon’s internal services. Its operation environment is assumed to be non-hostile and there are no security related requirements such as authentication and authorization. Moreover, since each service uses its distinct instance of Dynamo, its initial design targets a scale of up to hundreds of storage hosts. We will discuss the scalability limitations of Dynamo and possible scalability related extensions in later sections.

Lots of worthy items to note here. The first is that you can get a lot of traction out of a simple data structure such as a hash table. Specifically, as noted by Sam Ruby in his post Key + Data, accessing data by key instead of using complex queries is becoming a common pattern in large scale distributed storage systems. Sam actually missed pointing out that Google’s Bigtable is another example of this trend given that data items within it are accessed using the tuple {row key, column key, timestamp} instead of being queried using data manipulation language.

Another interesting thing, from my perspective, is that they’ve gotten around hitting scaling limits at running it on hundreds of storage hosts by having different teams at Amazon run their own instances of Dynamo. Then again, there are 200 clusters of GFS running at Google, so this is probably common sense as well.

4.4 Data Versioning

Dynamo provides eventual consistency, which allows for updates to be propagated to all replicas asynchronously. A put() call may return to its caller before the update has been applied at all the replicas, which can result in scenarios where a subsequent get() operation may return an object that does not have the latest updates.. If there are no failures then there is a bound on the update propagation times. However, under certain failure scenarios (e.g., server outages or network partitions), updates may not arrive at all replicas for an extended period of time.

There is a category of applications in Amazon’s platform that can tolerate such inconsistencies and can be constructed to operate under these conditions. For example, the shopping cart application requires that an “Add to Cart” operation can never be forgotten or rejected. If the most recent state of the cart is unavailable, and a user makes changes to an older version of the cart, that change is still meaningful and should be preserved. But at the same time it shouldn’t supersede the currently unavailable state of the cart, which itself may contain changes that should be preserved. Note that both “add to cart” and “delete item from cart” operations are translated into put requests to Dynamo. When a customer wants to add an item to (or remove from) a shopping cart and the latest version is not available, the item is added to (or removed from) the older version and the divergent versions are reconciled later.

In order to provide this kind of guarantee, Dynamo treats the result of each modification as a new and immutable version of the data. It allows for multiple versions of an object to be present in the system at the same time. Most of the time, new versions subsume the previous version(s), and the system itself can determine the authoritative version (syntactic reconciliation). However, version branching may happen, in the presence of failures combined with concurrent updates, resulting in conflicting versions of an object. In these cases, the system cannot reconcile the multiple versions of the same object and the client must perform the reconciliation in order to collapse multiple branches of data evolution back into one (semantic reconciliation). A typical example of a collapse operation is “merging” different versions of a customer’s shopping cart. Using this reconciliation mechanism, an “add to cart” operation is never lost. However, deleted items can resurface.

Fascinating. I can just imagine how scary this most sound to RDBMS heads to think that instead of the database enforcing the rules of consistency, it just keeps multiple versions of the “row” around and then asks the client to figure out which is which if there were multiple updates that couldn’t be reconciled.

The folks at Amazon have taken acknowledgement of the CAP Conjecture to its logical extreme. Consistency, Availability, and Partition-tolerance. Pick two.

There’s lots of other interesting stuff in the paper but I’ll save some for you to read and end my excerpts here. This will make great bedtime reading this weekend.

Now playing: Geto Boys - My Mind's Playin Tricks On Me


 

Categories:

According to blog posts like A Flood of Mashups Coming? OAuth 1.0 Released and John Musser’s OAuth Spec 1.0 = More Personal Mashups? , it looks like the OAuth specification has reached it’s final draft.

This is good news because the need for a standardized mechanism for users to give applications permission to access their data or act on their behalf has been obvious for a while. The most obvious manifestation of this are all the applications that ask for your username and password so they can retrieve your contact list from your email service provider.

So what exactly is wrong with applications like the one’s shown below?

meebo

spock

The problem with these applications [which OAuth solves] is that when I give them my username and password, I’m not only giving them access to my address book but also access to

because all of those services use the same credentials. Sounds scary when put in those terms doesn’t it?

OAuth allows a service provider (like Google or Yahoo!) to expose an interface that allows their users to give applications permission to access their data while not exposing their login credentials to these applications. As I’ve mentioned in the past, this standardizes the kind of user-centric API model that is utilized by Web services such as the Windows Live Contacts API, Google AuthSub and the Flickr API to authenticate and authorize applications to access a user’s data.  

The usage flow end users can expect from OAuth enabled applications is as follows.

1. The application or Web site informs the user that it is about to direct the user to the service provider’s Web site to grant it permission.

2. The user is then directed to the service providers Web site with a special URL that contains information about the requesting application. The user is prompted to login to the service provider’s Website to verify their identity. 

 

3. The user grants the application permission.

4. The application gets access to the user’s data and the user never had to hand over their username and password to some random application which they might not trust.

I’ve read the final draft of the OAuth 1.0 spec and it seems to have done away with some of the worrisome complexity I’d seen in earlier draft (i.e. single use and multi-use tokens). Great work by all those involved.

I never had time to participate in this effort but it looks like I wouldn’t have had anything to add. I can’t wait to see this begin to get deployed across the Web.

Now playing: Black Eyed Peas - Where is the Love (feat. Justin Timberlake)


 

Scott Guthrie has a blog post entitled Releasing the Source Code for the .NET Framework Libraries where he writes

One of the things my team has been working to enable has been the ability for .NET developers to download and browse the source code of the .NET Framework libraries, and to easily enable debugging support in them.

Today I'm excited to announce that we'll be providing this with the .NET 3.5 and VS 2008 release later this year.

We'll begin by offering the source code (with source file comments included) for the .NET Base Class Libraries (System, System.IO, System.Collections, System.Configuration, System.Threading, System.Net, System.Security, System.Runtime, System.Text, etc), ASP.NET (System.Web), Windows Forms (System.Windows.Forms), ADO.NET (System.Data), XML (System.Xml), and WPF (System.Windows).  We'll then be adding more libraries in the months ahead (including WCF, Workflow, and LINQ).  The source code will be released under the Microsoft Reference License (MS-RL).

This is one of those announcements I find hard to get excited about. Any developer who’s been frustrated by the weird behavior of a .NET Framework class and has wanted to look at it’s code, should already know about Lutz Roeder’s Reflector which is well known in the .NET devoper community. So I’m not sure who this anouncement is actually intended to benefit.

On the other hand, I’m sure Java developers are having a chuckle at our expense that it took this long for Microsoft to allow developers to see the source code for ArrayList.Count so we can determine if it is lazily or eagerly evaluated.

Oh well, better late than never.

PS: The ability to debug into .NET Framework classes will be nice. I’ve wanted this more than once while working on RSS Bandit and will definitely take advantage of it if I ever get around to installing VS 2008.

Now playing: TLC - Somethin Wicked This Way Comes (feat. Andre 3000)


 

Categories: Programming

This is another post I was planning to write a few weeks ago which got interrupted by my wedding and honeymoon.

A few weeks ago, Joel Spolsky wrote a post entitled Strategy Letter VI which I initially dismissed as the ravings of a desktop developer who is trying to create an analogy when one doesn’t exist. The Web isn’t the desktop, or didn’t he read There is no Web Operating System (or WebOS)? By the second time I read it, I realized that if you ignore some of the desktop-centric thinking in Joel’s article, then not only is Joel’s article quite insightful but some of what he wrote is already coming to pass.

The relevant excerpt from Joel’s article is

Somebody is going to write a compelling SDK that you can use to make powerful Ajax applications with common user interface elements that work together. And whichever SDK wins the most developer mindshare will have the same kind of competitive stronghold as Microsoft had with their Windows API.

If you’re a web app developer, and you don’t want to support the SDK everybody else is supporting, you’ll increasingly find that people won’t use your web app, because it doesn’t, you know, cut and paste and support address book synchronization and whatever weird new interop features we’ll want in 2010.

Imagine, for example, that you’re Google with GMail, and you’re feeling rather smug. But then somebody you’ve never heard of, some bratty Y Combinator startup, maybe, is gaining ridiculous traction selling NewSDK,

And while you’re not paying attention, everybody starts writing NewSDK apps, and they’re really good, and suddenly businesses ONLY want NewSDK apps, and all those old-school Plain Ajax apps look pathetic and won’t cut and paste and mash and sync and play drums nicely with one another. And Gmail becomes a legacy. The WordPerfect of Email. And you’ll tell your children how excited you were to get 2GB to store email, and they’ll laugh at you. Their nail polish has more than 2GB.

Crazy story? Substitute “Google Gmail” with “Lotus 1-2-3”. The NewSDK will be the second coming of Microsoft Windows; this is exactly how Lotus lost control of the spreadsheet market. And it’s going to happen again on the web because all the same dynamics and forces are in place. The only thing we don’t know yet are the particulars, but it’ll happen

A lot of stuff Joel asserts seems pretty clueless on the face of it. Doesn’t he realize that there are umpteen billion AJAX toolkits (e.g. Dojo, Google Web Toolkit, Yahoo! User Interface Library, Script.aculo.us, etc)  and rich internet application platforms (e.g. Flash, Silverlight, XUL, etc)? Doesn’t he realize that there isn’t a snowball’s chance in hell of the entire Web conforming to standard user interface guidelines let alone everyone agreeing on using the same programming language and SDK to build Web apps?

But wait…

What happens if you re-read the above excerpt and substitute NewSDK with Facebook platform?

I didn’t classify Facebook as a Social Operating System for no reason. GMail and other email services have become less interesting to me because I primarily communicate with friends and family on the Web via Facebook and it’s various platform applications. I’ve stopped playing casual games at Yahoo! Games and now use Scrabulous and Texas Hold ‘Em when I want to idle some time away on the weekend. All of these applications are part of a consistent user interface, are all accessible from my sidebar and each of them has access to my data within Facebook including my social graph. Kinda like how Windows or Mac OS X desktop applications on my machine have a consistent user interface, are all accessible from my applications menu and can all access the data on my hard drive.

Hmmm…

I suspect that Joel is right about NewSDK, he’s just wrong about which form it will take. “Social operating system” does have a nice ring to it, doesn’t it?

Now playing: Kanye West - Two Words (feat. Mos Def, Freeway & The Harlem Boys Choir)


 

Categories: Platforms | Web Development

October 4, 2007
@ 04:00 AM

Mini-Microsoft has a blog post up to let us know that his Facebook account was cancelled. In the comments he clarifies he wasn’t specifically targetted and this is just part of the Facebook terms of service. He writes

For those who probably will never see this Facebook help-topic, this is what I've been directed to:

http://www.facebook.com/help.php?page=45

The only relevant text that I can find:

"Facebook does not allow users to register with fake names, to impersonate any person or entity, or to falsely state or otherwise misrepresent themselves or their affiliations."

I imagine they only do something when someone complains vs. being constantly policing things. And someone out there (scanning the crowd of exceptionally good looking people who visit here) must have taken it upon themselves to complain.

I didn’t realize that if I don’t provide 100% accurate data about myself (thus making identity theft easier) I could get my account banned from Facebook.

I can understand why they want to encourage people to use real names since they want to be the kind of place that have users like “Dare Obasanjo” and “Robert Scoble” not ‘carnage4life’ and ‘scobleizer’ since the former implies a more personal experience.

However it seems dumb to be trying to replicate Friendster’s mistake by killing off every account that didn’t conform to their standards. There are ways to encourage such behavior without being jerks as they’ve clearly been in this case.

Now playing: Dem Franchize Boyz - Oh I Think They Like Me (remix) (feat. Jermaine Dupri, Da Brat & Lil Bow Wow)


 

Yesterday morning, I tossed out a hastily written post entitled It Must Be a Fun Time to Work on Microsoft Office which seems to have been misread by some folks based on some of the comments I’ve seen on my blog and in other places. So further exposition of some of the points in that post seems necessary.

First of all, there’s the question of who I was calling stupid when talking about the following announcements

  • Google announcing the launch of Presently, their Web-based Powerpoint clone. Interestingly enough, one would have expected presentation software to be the most obvious application to move to the Web first instead of the last.
  • Yahoo! announcing the purchase of Zimbra, a developer of a Web-based office productivity and collaboration suite.
  • Microsoft announcing the it would integrate Web-based storage and collaboration into it’s desktop office productivity suite.
  • IBM announcing that it would ship it’s own branded version of an Open Source clone of Microsoft’s desktop productivity suite.

Given that three of these announcements are about embracing the Web and the last one is about building disconnected desktop software, I assumed it was obvious who was jumping on a dying paradigm while the rest of the industry has already moved towards the next generation. To put this another way, James Robertson’s readers were right that I was talking about IBM.

There is something I did want to call out about James Robertson’s post. He wrote

People have moved on to the 80% solution that is the web UI, because the other advantages outweigh that loss of "richness".

I don’t believe that statement when it comes to office productivity software. I believe that the advantages of leveraging the Web are clear. From my perspective

  1. universal access to my data from any device or platform 
  2. enabling collaboration with “zero install” requirements on collaborators

are clear advantages that Web-based office productivity software has over disconnected desktop software.

It should be noted that neither of these advantages requires that the user interface is Web-based or that it is rich (i.e. AJAX or Flash if it is Web-based). Both of these things help but they aren’t a hard requirement.

What is important is universal access to my data via the Web. The reason I don’t have an iPhone is because I’m hooked on my Windows Mobile device because of the rich integration it has with my work email, calendar and tasks list. The applications on my phone aren’t Web-based, they are the equivalent of “desktop applications” for my phone. Secondly, I didn’t have to install them because they were already on my phone [actually I did have to install Oxios ToDo List but that’s only because the out-of-the-box task list synchronization in Windows Mobile 5 was less than optimal for my needs].

I used to think that having a Web-based interface was also inevitable but that position softened once I realized that you’ll need offline support which means building support for local storage + synchronization into the application (e.g. Google Reader's offline mode) to truly hit the 80/20 point for most people given how popular laptops are these days. However once you’ve built that platform, the same storage and synchronization engine could be used by a desktop application as well.

In that case, either way I get what I want. So desktop vs. Web-based UI doesn’t matter since they both have to stretch themselves to meet my needs. But it is probably a shorter jump to Web-enable the desktop applications than it is to offline-enable the Web applications.  

Now playing: Playa Fly - Feel Me


 

This is one of those posts I started before I went on my honeymoon and never got around to finishing. There are lots of interesting things happening in the world of office productivity software these days. Here are four announcements from the past three weeks that show just how things are heating up in this space, especially if you agree with Steve Gillmor that Office is Dead *(see footnote).

From the article Google Expands Online Software Suite 

MOUNTAIN VIEW, Calif. (AP) — Google Inc. has expanded its online suite of office software to include a business presentation tool similar to Microsoft Corp.'s popular PowerPoint, adding the latest twist in a high-stakes rivalry.

Google's software suite already included word processing, spreadsheet and calendar management programs. Microsoft has been reaping huge profits from similar applications for years.

Unlike Google's applications, Microsoft's programs are usually installed directly on the hard drives of computers.

From the article I.B.M. to Offer Office Software Free in Challenge to Microsoft’s Line

I.B.M. plans to mount its most ambitious challenge in years to Microsoft’s dominance of personal computer software, by offering free programs for word processing, spreadsheets and presentations.

Steven A. Mills, senior vice president of I.B.M.’s software group, said the programs promote an open-source document format.

The company is announcing the desktop software, called I.B.M. Lotus Symphony, at an event today in New York. The programs will be available as free downloads from the I.B.M. Web site.

From the blog post Yahoo scoops up Zimbra for $350 million

Yahoo has been on an acquisition binge late, but mostly to expand its advertising business. Now Yahoo is buying its way deeper into the applications business with the acquisition of Zimbra for a reported $350 million, mostly in cash. Zimbra developed a leading edge, Web 2.0 open source messaging and collaboration software suite, with email, calendar, document processing and a spreadsheet.

and finally, from the press release Microsoft Charts Its Software Services Strategy and Road Map for Businesses

 Today Microsoft also unveiled the following:

  • Microsoft® Office Live Workspace, a new Web-based feature of Microsoft Office that lets people access their documents online and share their work with others

Office Live Workspace: New Web Functionality for Microsoft Office

Office Live Workspace is among the first entries in the new wave of online services. Available at no charge, Office Live Workspace lets people do the following:

  • Access documents anywhere. Users can organize documents and projects for work, school and home online, and work on them from almost any computer even one not connected to the company or school network. They can save more than 1,000 Microsoft Office documents to one place online and access them via the Web.
  • Share with others. Users can work collaboratively on a project with others in a password-protected, invitation-only online workspace, helping to eliminate version-control challenges when e-mailing drafts to multiple people. Collaborators who don’t have a desktop version of Microsoft Office software can still view and comment on the document in a browser.

As you can see one of these four announcements is not like the others. Since it isn’t fair to pick on the stupid, I’ll let you figure out which company is jumping on a dying paradigm while the rest of the industry has already moved towards the next generation.  The Web is no longer the future of computing, computing is now about the Web.

* I do. Disconnected desktop software needs to go the way of the dodo.

Now playing: Prince - Sign 'O' the Times


 

October 2, 2007
@ 03:20 PM

Over a year ago, I commented that sometimes it feels like working at Microsoft is like working in Dinosaur Country. Every time, I hear the phrase “software as a service” or it’s cousin “software plus services” it makes me feel this way. Most of the people uttering this crap don’t realize that this makes them sound as dated as the old codgers who kept on talking about “horseless carriages” when everyone else called them automobiles or just plain cars.

Case in point, this article from the Telegraph entitled Microsoft powers up for change which contains this humdinger of an opening paragrapgh

Chief executive says free software, downloadable online, is on the horizon for consumers. Josephine Moulds reports

Steve Ballmer, chief executive of Microsoft, yesterday signalled another step towards a dramatic change in the software giant's business model.

In London on a whistle-stop tour, Ballmer was discussing the delivery of software packages over the internet. "We are a software company, and yet in a sense, the very form of our core capability is changing. We need to change our capabilities so that we are not just good at writing bits that you put out on CD and deliver, but rather writing this thing that is a living, breathing, dynamic, organic thing."

What’s next? A press release announcing that pasteurization may not be a fad? A news story conceding that heavier-than-air aircraft may just be the way to go after all? 

*sigh*

Now playing: The Verve - Bitter Sweet Symphony


 

Categories: Life in the B0rg Cube

I scored an invite to FriendFeed and after trying out the service, I have to say it is both disappointing and encouraging at the same time. It is disappointing because one would expect folks like Bret Taylor and Paul Buchheit who helped launch Google Maps, Gmail and AdSense while at Google to come up with something more innovative than a knock-off of Plaxo Pulse and Google’s SocialStream which are themselves knock-offs of the Facebook News feed.

On the other hand, this is encouraging because it is another example of how the digital lifestyle aggregator is no longer just a far out idea being tossed around on Marc Canter’s blog but instead has become a legitimate product category.  

So what exactly is FriendFeed? The site enables users to associate themselves with the various user generated content (UGC) sites which they use regularly that publish RSS feeds or provide open APIs and then this is turned into the equivalent of a Facebook Mini Feed for the user. You can get a good idea of it by viewing my page at http://friendfeed.com/carnage4life which aggregates the recent activities from my profiles on reddit, digg, and youtube.

The “innovation” with FriendFeed is that instead of asking you to provide the URLs of your RSS feeds, the site figures out your RSS feed from your username on the target service. See the screenshot below for this in action

Of course, this same “innovation” exists in Plaxo Pulse so this isn’t mindblowing. If anything, FriendFeed is currently a less feature rich version of Plaxo Pulse.

I personally doubt that this site will catch on because it suffers from the same chicken and egg problem that face all social networking sites that depend on network effects. And if it does catch on, given that there is zero barrier to entry in the feature-set they provide, I wouldn’t be surprised to see Facebook and a host of other services roll this into their feature set. I expect that News Feed style pages will eventually show up in a majority of social sites, in much the same way that practically every website these days has a friend’s list and encourages user generated content. It’s just going to be another feature when it comes to making a website, kinda like using tabs for navigation.

I’m sure Marc Canter finds this validation of his vision quite amusing.

Now playing: Puddle of Mudd - Control