In my recent post on building a Twitter search engine on Windows Azure I questioned the need the expose the notion of both partition and row keys to developers on the platforms. Since then I've had conversations with a couple of folks at work that indicate that I should have stated my concerns more explicitly. So here goes.

The documentation on Understanding the Windows Azure Table Storage Data Model states the following

PartitionKey Property

Tables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity's primary key. The partition key may be a string value up to 32 KB in size.

You must include the PartitionKey property in every insert, update, and delete operation.

RowKey Property

The second part of the primary key is the row key, specified by the RowKey property. The row key is a unique identifier for an entity within a given partition. Together the PartitionKey and RowKey uniquely identify every entity within a table.

The row key is a string value that may be up to 32 KB in size.

You must include the RowKey property in every insert, update, and delete operation.

In my case I'm building an application to represent users in a social network and each user is keyed by user ID (e.g. their Twitter user name). In my application I only have one unique key and it identifies each row of user data (e.g. profile pic, location, latest tweet, follower count, etc). My original intuition was to use the unique ID as the row key while letting the partition key be a single value. The purpose of the partition key is that it is a hint to say this data belongs on the same machine which in my case seems like overkill.

Where this design breaks down is when I actually end up storing more data than the Windows Azure system can or wants to fit on a single storage node. For example, what if I've actually built a Facebook crawler (140 million users) and I cache people's profile pics locally (10kilobytes). This ends up being 1.3 terabytes of data. I highly doubt that the Azure system will be allocating 1.3 terabytes of storage on a single server for a single developer and even if it did the transaction performance would suffer. So the only reasonable assumption is that the data will either be split across various nodes at some threshold [which the developer doesn't know] or at some point the developer gets a "disk full error" (i.e. a bad choice which no platform would make).

On the other hand, if I decide to use the user ID as the partition key then I am in essence allowing the system to theoretically store each user on a different machine or at least split up my data across the entire cloud. That sucks for me if all I have is three million users for which I'm only storing 1K of data so it could fit on a single storage node. Of course, the Windows Azure system could be smart enough to not split up my data since it fits underneath some threshold [which the developer doesn't know]. And this approach also allows the system to take advantage of parallelism across multiple machines if it does split my data.

Thus I'm now leaning towards the user ID being the partition key instead of the row key. So what advice do the system's creators actually have for developers?

Well from the discussion thread POST to Azure tables w/o PartitionKey/RowKey: that's a bug, right? on the MSDN forums there is the following advice from Niranjan Nilakantan of Microsoft

If the key for your logical data model has more than 1 property, you should default to (multiple partitions, multiple rows for each partition).

If the key for your logical data model has only one property, you would default to (multiple partitions, one row per partition).

We have two columns in the key to separate what defines uniqueness(PartitionKey and RowKey) from what defines scalability(just PartitionKey).
In general, write and query times are less affected by how the table is partitioned.  It is affected more by whether you specify the PartitionKey and/or RowKey in the query.

So that answers the question and validates the conclusions we eventually arrived at. It seems we should always use the partition key as the primary key and may optionally want to use a row key as a secondary key, if needed.

In that case, the fact that items with different partition keys may or may not be stored on the same machine seems to be an implementation detail that shouldn't matter to developers since there is nothing they can do about it anyway. Right?

Note Now Playing: Scarface - Hand of the Dead Body Note


 

Categories: Web Development

I've been spending some time thinking about the ramifications of centralized identity plays coming back into vogue with the release of Facebook Connect, MySpaceID and Google's weird amalgam Google Friend Connect. Slowly I began to draw parallels between the current situation and a different online technology battle from half a decade ago.

About five years ago, one of the most contentious issues among Web geeks was the RSS versus Atom debate. On the one hand there was RSS 2.0, a widely deployed and fairly straightforward XML syndication format which had some ambiguity around the spec but whose benevolent dictator had declared the spec frozen to stabilize the ecosystem around the technology.  On the other hand you had the Atom syndication format, an up and coming XML syndication format backed by a big company (Google) and a number of big names in the Web standards world (Tim Bray, Sam Ruby, Mark Pilgrim, etc) which intended to do XML syndication the right way and address some of the flaws of RSS 2.0.

During that time I was an RSS 2.0 backer even though I spent enough time on the atom-syntax mailing list to be named as a contributor on the final RFC. My reasons for supporting RSS 2.0 are captured in my five year old blog post The ATOM API vs. the ATOM Syndication Format which contained the following excerpt

Based on my experiences working with syndication software as a hobbyist developer for the past year is that the ATOM syndication format does not offer much (if anything) over RSS 2.0 but that the ATOM API looks to be a significant step forward compared to previous attempts at weblog editing/management APIs especially with regard to extensibility, support for modern practices around service oriented architecture, and security.
...
Regardless of what ends up happening, the ATOM API is best poised to be the future of weblog editting APIs. The ATOM syndication format on the other hand...

My perspective was that the Atom syndication format was a burden on consumers of feeds since it meant they had to add yet another XML syndication format to the list of formats they supported; RSS 0.91, RSS 1.0, RSS 2.0 and now Atom. However the Atom Publishing Protocol (AtomPub) was clearly an improvement to the state of the art at the time and was a welcome addition to the blog software ecosystem. It would have been the best of both worlds if AtomPub simply used RSS 2.0 so we got the benefits with none of the pain of duplicate syndication formats.

As time has passed, it looks like I was both right and wrong about how things would turn out. The Atom Publishing Protocol has been more successful than I could have ever imagined. It not only became a key blog editing API but evolved to become a key technology for accessing data from cloud based sources that has been embraced by big software companies like Google (GData) and Microsoft (ADO.NET Data Services, Live Framework, etc). This is where I was right.

I was wrong about how much of a burden having XML syndication formats would be on developers and end users. Although it is unfortunate that every consumer of XML feed formats has to write code to process both RSS and Atom feeds, this has not been a big deal. For one, this code has quickly been abstracted out into libraries on the majority of popular platforms so only a few developers have had to deal with it. Similarly end users also haven't had to deal with this fragmentation that much. At first some sites did put out feeds in multiple formats which just ended up confusing users but that is mostly a thing of the past. Today most end users interacting with feeds have no reason to know about the distinction between Atom or RSS since for the most part there is none when you are consuming the feed from Google Reader, RSS Bandit or your favorite RSS reader.

I was reminded by this turn of events when reading John McCrea's post As Online Identity War Breaks Out, JanRain Becomes “Switzerland” where he wrote

Until now, JanRain has been a pureplay OpenID solution provider, hoping to build a business just on OpenID, the promising open standard for single sign-on. But the company has now added Facebook as a proprietary login choice amidst the various OpenID options on RPX, a move that shifts them into a more neutral stance, straddling the Facebook and “Open Stack” camps. In my view, that puts JanRain in the interesting and enviable position of being the “Switzerland” of the emerging online identity wars.

Weezer Two

For site operators, RPX offers an economical way to integrate the non-core function of “login via third-party identity providers” at a time when the choices in that space are growing and evolving rapidly. So, rather than direct its own technical resources to integrating Facebook Connect and the various OpenID implementations from MySpace, Google, Yahoo, AOL, Microsoft, along with plain vanilla OpenID, a site operator can simply outsource all of those headaches to JanRain.

Just as standard libraries like the Universal Feed Parser and the Windows RSS platform insulated developers from the RSS vs. Atom formatwar, JanRain's RPX makes it so that individual developers don't have to worry about the differences between supporting proprietary technologies like Facebook Connect or Open Stack™ technologies like OpenID.

At the end of the day, it is quite likely that the underlying technologies will not matter to matter to anyone but a handful of Web standards geeks and library developers. Instead what is important is for sites to participate in the growing identity provider ecosystem not what technology they use to do so.

Note Now Playing: Yung Wun featuring DMX, Lil' Flip & David Banner - Tear It Up Note


 

Categories: Web Development

Last month James Governor of Redmonk had a blog post entitled Asymmetrical Follow: A Core Web 2.0 Pattern where he made the following claim

You’re sitting at the back of the room in a large auditorium. There is a guy up front, and he is having a conversation with the people in the front few rows. You can’t hear them quite so well, although it seems like you can tune into them if you listen carefully. But his voice is loud, clear and resonant. You have something to add to the conversation, and almost as soon as you think of it he looks right at you, and says thanks for the contribution… great idea. Then repeats it to the rest of the group.

That is Asymmetrical Follow.

When Twitter was first built it was intended for small groups of friends to communicate about going to the movies or the pub. It was never designed to cope with crazy popular people like Kevin Rose (@kevinrose 76,185 followers), Jason Calacanis (@jasoncalacanis 42,491), and Scobleizer (@scobleizer 41,916). Oh yeah, and some dude called BarackObama (@barackobama 141,862)

If you’re building a social network platform its critical that you consider the technical and social implications of Asymmetrical Follow. You may not expect it, but its part of the physics of social networks. Shirky wrote the book on this. Don’t expect a Gaussian distribution.

Asymmetric Follow is a core pattern for Web 2.0, in which a social network user can have many people following them without a need for reciprocity.

James Governor mixes up two things in his post which at first made it difficult to agree with its premise. The first thing he talks about is the specifics of the notion of a follower on Twitter where he focuses on the fact that someone may not follow you but you can follow them and get their attention by sending them an @reply which is then rebroadcast to their audience when they reply to your tweet. This particular feature is not a core design pattern of social networking sites (or Web 2.0 or whatever you want to call it).

The second point is that social networks have to deal with the nature of popularity in social circles as aptly described in Clay Shirky's essay Power Laws, Weblogs, and Inequality from 2003. In every social ecosystem, there will be people who are orders of magnitude more popular than others. Mike Arrington's blog is hundreds of times more popular than mine. My blog is hundreds of times more popular than my wife's. The adequately reflect this reality of social ecosystems, social networking software should scale up to being usable both by the super-popular and the long tail of unpopular users. Different social applications support this in different ways. Twitter supports this by making the act of showing interest in another user a one way relationship that doesn't have to reciprocated (i.e. a follower) and then not capping the amount of followers a user can have. Facebook supports this by creating special accounts for super-popular users called Facebook Pages which also have a one way relationship between the popular entity and its fans. Like Twitter, there is no cap on the number of fans a "Facebook Page" can have. Facebook differs from Twitter forcing super-popular users to have a different representation from regular users.

In general, I agree that being able to support the notion of super-popular users who have lots of fellow users who are their "fans" or "followers" is a key feature that every social software application should support natively. Applications that don't do this are artificially limiting their audience and penalizing their popular users.

Does that make it a core pattern for "Web 2.0"? I guess so.

Note Now Playing: David Banner - Play Note


 

Categories: Social Software

I spent the last few days hacking on a side project that I thought some of my readers might find interesting; you can find it at http://hottieornottie.cloudapp.net 

I had several goals when embarking on this project

After a few days of hacking I'm glad to say I've achieved every goal I wanted to get out of this experiment. I'd like to thank Matt Cutts for the initial idea on how to implement this and Kevin Mark's for saving me from having to write a Twitter crawler by reminding me of Google's Social Graph API.

What it does and how it works

The search experiment provides four kinds of searches

  1. The search functionality with no options checked is exactly the same as search.twitter.com

  2. Checking "Search Near Me" finds all tweets posted by people who are within 30 miles of your geographical location (requires JavaScript). Your geographical location is determined from your IP address while the geographical location of the tweets is determined from the location fields of the Twitter profiles of the authors. Nice way to find out what people in your area are thinking about local news.

  3. Checking 'Sort By Follower Count' is my attempt to jump on the authority based Twitter search bandwagon. I don't think it's very useful but it was easy to code. Follower counts are obtained via the Google Social Graph API.

  4. Checking 'Limit to People I Follow' requires you to also specify your user name and then all search results are filtered to only return results from people you follow (requires JavaScript). This feature only works for a small subset of Twitter users that have been encountered by a crawler I wrote. The application is crawling Twitter friend lists as you read this and anyone I follow should already have their friend list crawled. If it doesn't work for you, check back in a few days. It's been slow going since Twitter puts a 100 request per hour cap on crawlers.

Developing on Windows Azure: Likes

After building a small scale application with Windows Azure, there are definitely a number of things I like about the experience. The number one thing I loved was the integrated deployment story with Visual Studio. I can build a regular ASP.NET application on my local machine that either used cloud or local storage resources and all it takes is a few mouse clicks to go from my code running on my machine to my code running on computers in Microsoft's data center either in a staging environment or in production. The fact that the data access APIs are all RESTful makes it super easy to go from pointing the app running on your machine to cloud storage or local storage on your machine simply by changing some base URIs in a configuration file. 

Another aspect of Windows Azure that I thought was great is how easy it is to create background processing tasks. It was very straightforward to create a Web crawler that crawled Twitter to build a copy of its social graph by simply adding a "Worker Role" to my project. I've criticized Google App Engine in the past for not supporting the ability to create background tasks so it is nice to see this feature in Microsoft's platform as a service offering. 

Developing on Windows Azure: Dislikes

The majority of my negative experiences were related to teething problems I'd associate with this being a technology preview that still needs polishing. I hit a rather frustrating bug where half the time I tried to run my application it would end up hanging and I'd have to try again after several minutes. There were also issues with the Visual Studio integration where removing or renaming parts of the project from the Visual Studio UI didn't modify all of the related configuration files so the app was in a broken state until I mended it by hand. Documentation was another place where there is still a lot of work to do. My favorite head scratching moment is that there is a x-ms-Metadata-ApproximateMessagesCount HTTP header which returns the approximate number of messages in the a queue. It is unclear whether "approximate" here refers to the fact that messages in the queue have an "invisibility period" between when they are popped from the queue but before they are deleted where they can't be accessed or whether it refers to some other heuristic that determines the size of the queue. Then there's the fact that the documentation says you need to have a partition key and row key for each entry you place in a table but doesn't really explain why or how you are supposed to pick these keys. In fact, the documentation currently makes it seem like the notion of partition keys is an example of unnecessarily surfacing implementation details of Windows Azure to developers in a way that leads to confusion and cargo cult programming.

One missing piece is the lack of good tools for debugging your application once it is running in the cloud. When it is running on your local machine there is a nice viewer to keep an eye on the log output from your application but once it is in the cloud, your only option is to have the logs dropped to some directory in the cloud and then run one of the code samples to access those logs from your local machine. Since this is a technology preview, it is expected that the tooling shouldn't be all there but it is a cumbersome process as it exists today. Besides accessing your debug output there is also seeing what data your application is actually creating, retrieving and otherwise manipulating in storage. You can use SQL Server Management Studio to look at your data in Table Storage on your local machine but there isn't a similar experience in the cloud. Neither blob nor queue storage have any off-the-shelf tools for inspecting their contents locally or in the cloud so developers have to write custom code by hand. Perhaps this is somewhere the developer community can step up with some Open Source tools (e.g. David Aiken's Windows Azure Online Log Reader) or perhaps some commercial vendors will do step in as they have in the case of Amazon's Web Services (e.g. RightScale)?

Outside of the polish issues and bugs, there was only one aspect of Windows Azure development I disliked; the structured data/relational schema development process. Windows Azure has a Table Storage API which provides a RESTful interface to a row-based data store similar in concept to Google's BigTable. Trying to program locally against this API is rather convoluted and requires writing your classes first then running some object<->relational translation tools on your assemblies. This is probably a consequence of not being a big believer the use of ORM tools so having to first write objects before I can access my DB seems backwards to me. This gripe may just be a matter of preference since a lot of folks who use Rails, Django and various other ORM technologies seem fine with having primarily an object facade over their databases.  

Update: Early on in my testing I got a The requested operation is not implemented on the specified resource error when trying out a batch query and incorrectly concluded that the Table Storage API did not support complex OR queries. It turns out that the problem was that I was doing a $filter query using the tolower function. Once I took out the tolower() it was straightforward to construct queries with a bunch of OR clauses so I could request for multiple row keys at once.

I'll file this under "documentation issues" since there is a list of unsupported LINQ query operators and unsupported LINQ comparison operators but not a list of unsupported query expression functions in the Table Storage API documentation. Sorry about any confusion and thanks to Jamie Thomson for asking about this so I could clarify.

Besides the ORM issue, I felt that I was missing some storage capabilities when trying to build my application. One of the features I started building before going with the Google Social Graph API was a quick way to provide the follower counts for a batch of users. For example, I'd get 100 search results from the Twitter API and would then need to look up the follower counts of each user that showed up in the results for use in sorting. However there was no straightforward way to implement this lookup service in Windows Azure. Traditionally, I'd have used one of the following options

  1. Create a table of {user_id, follower_count} in a SQL database and then use batches of ugly select statements like SELECT FROM follower_tbl WHERE id=xxxx OR id=yyyy OR id=zzzz OR ….
  2. Create tuples of {user_id, follower_count} in an in-memory hash table like memcached and then do a bunch of fast hash table lookups to get the follower counts from each user

Neither of these options is possible given the three data structures that Windows Azure gives you. It could be that these missing pieces are intended to be provided by SQL Data Services which I haven't taken a look at yet. If not, the lack of these pieces of functionality will be sticking point in the craw of developers making the switch from traditional Web development platforms.

Note Now Playing: Geto Boys - Gangsta (Put Me Down) Note


 

Categories: Personal | Programming

This morning, Jeff Atwood wrote a blog post about software piracy entitled My Software Is Being Pirated where he talks about how companies can deal with the fact that the piracy rate among their users could be as high as 90%. He writes

Short of ..

  1. selling custom hardware that is required to run your software, like the Playstation 3 or Wii
  2. writing a completely server-side application like World of Warcraft or Mint

.. you have no recourse. Software piracy is a fact of life, and there's very little you can do about it. The more DRM and anti-piracy devices you pile on, the more likely you are to harm and alienate your paying customers. Use a common third party protection system and it'll probably be cracked along with all the other customers of that system. Nobody wants to leave the front door to their house open, of course, but you should err on the side of simple protection whenever possible. Bear in mind that a certain percentage of the audience simply can't be reached; they'll never pay for your software at any price. Don't penalize the honest people to punish the incorrigible. As my friend Nathan Bowers so aptly noted:

Every time DRM prevents legitimate playback, a pirate gets his wings.

In fact, the most effective anti-piracy software development strategy is the simplest one of all:

  1. Have a great freaking product.
  2. Charge a fair price for it.

(Or, more radically, choose an open source business model where piracy is no longer a problem but a benefit -- the world's most efficient and viral software distribution network. But that's a topic for a very different blog post.)

It is interesting to note that Jeff's recommendation for an effective anti-piracy solution is actually contradicted by the example game from his post; World of Goo. The game is an excellent product and is available for ~$15 yet it is still seeing a 90% piracy rate. In fact, the most effective anti-piracy strategy is simply to route around the problem as Jeff originally stated. Specifically

  • target custom hardware platforms such as the iPhone or XBox 360 which don't have a piracy problem
  • build Web-based software

However if you do decide to go down the shrinkwrapped software route, I'd suggest casting a critical eye on any claims that highlight the benefits of the "Open Source business model" to shrinkwrapped software developers. Open Source software companies have been around for over a decade (e.g. RedHat was founded in 1995) and we now have experience as an industry with regards to what works and what doesn't work as a business model for Open Source software.

There are basically three business models for companies that make money from Open Source software, they are

  1. Selling support, consulting and related services for the "free" software (aka the professional open source business model ) – RedHat
  2. Dual license the code and then sell traditional software licenses to enterprise customers who are scared of the GPL – MySQL AB
  3. Build a proprietary Web application powered by Open Source software – Google

As you scan this list, it should be clear that none of these business models actually involves making money directly from selling only the software. This is problematic for developers of shrinkwrapped, consumer software such as games because none of the aforementioned business models actually works well for them.

For developers of shrinkwrapped software, Open Source only turns piracy from a problem into a benefit if you're willing to forego building consumer software and you have software that is either too complicated to use without handholding OR you can scare a large percentage of your customers into buying traditional software licenses by using the GPL instead of the BSDL.

Either way, the developers of World of Goo are still screwed. Sad

Note Now Playing: The Notorious B.I.G. - Mo Money Mo Problems (feat. Mase & Puff Daddy) Note


 

Categories: Technology

UPDATE: It seems the problem was hardware related. Disc unreadable errors with perfectly good discs is one of the symptoms of an impending red ring of death.


My wife bought me Gears of War 2 for Christmas from Amazon. So far I've been unable to get past the start screen of the game due to a This disc is unreadable error and attempting to install the game to the hard drive fails at 40% complete. This is the only game in my library that exhibits this problem. I also got Grand Theft Auto IV and have been able to play that fine with no problems.

Searching online indicates that this is a widespread problem.

The only consistent thing about all the threads is that I have seen no confirmed solution to the problem. The best suggestion seems to be to return the disk to the retailer and ask for an exchange for a different Gears of War 2 disc. With this option there's still a chance of getting another bum disc and it seems this would be a difficult option to pursue with an online retailer like Amazon.  There's also the option of sending my XBox in for repairs although given that the problem only exists with this one game, it is unclear that this is a problem with my actual hardware.

Before deciding to chalk up the $60 + tax my wife spent on this game to bad luck, I thought I'd throw up this blog post just in case one of my readers has encountered this problem and found a work around.

Merry Xmas!

Note Now Playing: The Notorious B.I.G. - You're Nobody (Til Somebody Kills You) Note


 

Categories: Video Games

December 17, 2008
@ 04:58 PM

Reading depressing layoff related blog posts like George Oates's Not quite what I had in mind and the Valley Wag's Laid-off Yahoos packing heat for Jerry Yang? reminded me that I've been meaning to post about open positions on our team for a while.

The team I work for is responsible for the "social graph", "news feed" and online presence platforms that power various Windows Live experiences. You can see some of our recent efforts in action by downloading Windows Live Essentials (beta) or visiting my profile on Windows Live and browsing around. If you are interested in building world class software that is used by hundreds of millions of people and the following job descriptions interest you then send me your resume 

Software Design Engineer (Developer)

The Windows Live Messenger service is the backbone of one of world’s leading instant messaging services. The service enables hundreds of millions of users to communicate efficiently using text, voice, video and real-time status updates. This high-profile business is growing to accommodate mobile devices, social networking, web applications and other new areas.
We are seeking a developer with a fondness and talent for working on large-scale fault-tolerant distributed systems. The job involves working on the back-end components that maintain user state and route messages and notifications. In addition to improving the systems performance and resiliency, our team will tackle hard new problems such as
- Supporting new ways of addressing users (personas or aliases)
- Extending user state to support offline presence and presence views
- Creating a generic notification service
- implementing effective request throttling and load-balancing across datacenters.

Software Design Engineer/Test (Tester)

Looking for your next big challenge? How about building the next version of the world’s largest IM and social network platform?
We are looking for a great SDET with solid design, problem solving skills and exceptional track record to help build the next version of Windows Live Messenger and Social Network platform. The Messenger network is already one of the largest social networks on the planet, delivering BILLIONS of messages a day for HUNDREDS of MILLIONS of users world-wide.
The SDET role involves working on these next-generation services and proving they can be delivered to the massive scale required, with the quality our users have come to expect. Particular focus areas for this role are scalability, performance and reliability:
Scalability - building software systems to take each piece of hardware to its limits, identifying bottlenecks, removing them and pushing harder; while also, proving the system can grow linearly, as hardware is added. (…think 1,000s of machines).
Performance - ensuring consistently fast response times across the system by smoothly managing peak traffic -- which averages in the 10s of millions of simultaneous online connections.
Reliability - building online services that remain reliable under stress which the operations team is able to easily monitor, troubleshoot, and repair; enabling the aggressive up time requirements we aim for.

Email your resume to dareo@msft.com (replace msft with microsoft) if the above job descriptions sound like they are a good fit for you. If you have any questions about what working here is like, you can send me an email and I'll either follow up via email or my blog to answer any questions of general interest [within reason].

Note Now Playing: Rascal Flatts - Fast Cars and Freedom Note


 

In a post entitled CS Broke Joe Gregorio writes

Computer Science, at a deep and fundamental level, is broken, and that applies not only to software but to hardware. One of the reasons that I have this feeling is that after programming for the past 25 years the field hasn't really changed. The conversations aren't any different. You could substitute 'Windows API' or 'Borland CGI' for 'HTML and CSS' and you'd be having the same exact conversations I had 15 or 20 years ago. We still struggle with leaks, be it memory, or file handles, or threads, or whatever. We still have race conditions. We still struggle with software that grows linearly in features but exponentially in complexity.

Two things came to mind as I read this

  1. If your definition of computer science includes HTML & CSS or Win32 then you're doing it wrong. In the words of Edsger Dijkstra, computer science is no more about computers than astronomy is about telescopes.

  2. Even if you limit the discussion to computer programming, does it also mean that civil engineering is broken because people today still have to discuss and solve the same kinds problems faced by the builders of the great wall of China or the Roman aqueducts?

Note Now Playing: Ice Cube - Go to Church Note


 

Categories: Programming

A few days ago, the top news story on Techmeme was the fact that Google launched Google Friend Connect and Facebook announced the final version of Facebook Connect within minutes of each other. Reading up on both announcements it seems interesting to note how most of the coverage is about who will win the race to dominate the Web as opposed to what end user value is actually being created by these technologies.

It was somewhat depressing until I read Dave Winer's brilliant post Soon it will be time to start over, again which contains the following excerpt

We're now reaching the end of a cycle, we're seeing feature wars. That's what's going on between Facebook and Google, both perfectly timing the rollouts of their developer proposition to coincide with the others' -- on the very same day! I don't even have to look at them and I am sure that they're too complicated. Because I've been around this loop so many times. The solution to the problem these guys are supposedly working on won't come in this generation, it can only come when people start over. They are too mired in the complexities of the past to solve this one. Both companies are getting ready to shrink. It's the last gasp of this generation of technology.  Permalink to this paragraph

But the next one can't be far away now. It will be exhilirating!! Permalink to this paragraph

Remember how great Google was when it first appeared? Permalink to this paragraph

Remember how great Netscape was, and before that Apple, and I know you guys won't like this, but Microsoft offered us some great new places to play. I remember finding out that their OS address space in 1981 was 640K. That was a lot to guy who was spending huge amounts of time trying to cram a 256K app into 48K. Permalink to this paragraph

The trick in each cycle is to fight complexity, so the growth can keep going. But you can't keep it out, engineers like complexity, not just because it provides them job security, also because they really just like it. But once the stack gets too arcane, the next generation throws their hands up and says "We're not going to deal with that mess."  Permalink to this paragraph

We're almost there now. ;-> Permalink to this paragraph

The value of Facebook Connect to Facebook is obvious. They get to become a centralized identity provider for the Web including the benefit of tracking every single time one of their users logs-in on a partner which lets them build an even better advertising profile of their users. Similarly the value to the customers of the sites adopting it seem clear at first. Below are the claimed benefits of Facebook Connect to users from my initial perusal

  1. One login, multiple sites. No need to create a new account on partner sites.
  2. Account information such as profile picture, location and other fields on the partner site can be prepopulated from Facebook
  3. Bring your social graph with you to partner sites. 
  4. Let your friends on Facebook know what you are doing on partner sites. Updates show up on your profile but do not go in your friends' news feeds (they go in their live feed instead). 

Where things get interesting is that none of these benefits require a proprietary and centralized approach like Facebook has done. If Facebook implemented OpenID and OpenID attribute exchange, they could have given their users the benefits of #1 and #2 using widely adopted industry standards.  For #3, there is the burgeoning Portable Contacts effort to define a set of standard APIs for accessing the social graph that supports the key data portability principles around this information. As for broadcasting your updates from one site to another, FriendFeed has shown how that can be done using standard technologies like RSS, Atom and XMPP. 

Ignoring the fact that Facebook Connect is a proprietary and centralized approach instead of being based on open standards, there are still other points worthy of debate. When trying out sites like CitySearch beta with Facebook Connect, the experience is that I am connected with all of my Facebook friends who also use CitySearch. There is the genuine question of whether users really want to use one friends' list across every site regardless of context (e.g. interacting with the exact same people on LinkedIn, MySpace and XBox Live) or whether they want to have universal access to any of their friends lists and bridge them when necessary?

Yesterday on Twitter, I mentioned that Facebook Connect is the wrong direction to go on the Web for the reasons above. I also called Google Friend Connect a misguided "me too" effort for trying to copy Facebook's strategy and glomming an AJAX widget play on top of it. Kevin Marks, an evangelist at Google challenged by statement with the following response

@Carnage4Life the problem for users is re-entering data and restating friends for each app. For developers its becoming social without this

If that is truly the problem, how does the technology in the video below solve the problem any better than the combination of OpenID and Portable Contacts?

As with OpenSocial, Google has fallen in love with its role as a spoiler when it comes to Facebook's platform efforts without stopping to think whether it actually makes sense to be aping Facebook's strategies in the first place. Monkey see, monkey do.

This will be the death of them if they aren't careful. Once you become a follower and define yourself by reacting to others actions, it is hard to step back into a leadership role both in the industry and even within one's corporate culture.

Note Now Playing: Britney Spears - Circus Note


 

One feature that you will not find in Windows Live's What's New list, which shows a feed of a the activities from user's social network, is inline comments. A number of sites that provide users with activity feeds from their social network such as Facebook and Friendfeed allow comments to be made directly on news items in the feed. These comments end up showing up as part of the activity feed that are visible to anyone who can view the feed item.

When Rob and I were deciding upon the key functionality of the What's New feed for the current release of Windows Live, we voted against inline comments for two reasons.

The key reason is that we want the feed to be about what your people in your network are doing and not what people you don't know are doing or saying. However with the Facebook feed I often have lengthy threads from people I don't know in my feed taking up valuable space above the fold. For example,

 

In the above screenshot, I find it rather awkward that a huge chunk of my feed is being taken up by comments from people I don't know who are from Randy's network. Besides the social awkwardness it creates there is another issue with the above screenshot. Given that there is limited real estate for showing your feed it seems counter productive for it to be dominated by comments from people you don't know which are never as interesting as actual feed items.

For the second reason, let's look at a screenshot of an activity feed from FriendFeed

in the above screenshot there are 24 comments on the feed item representing Robert Scoble's blog post. These are 24 comments that could have been posted on his blog but aren't. The more sites Robert imports his blog feed into, the more it fractures and steals away the conversation from his blog post. This is in addition to the fact that there is some confusion as to where people should leave comments on his blog post. I've had people get confused about whether to respond to my posts as a comment on my blog, in Friendfeed or on Facebook and it didn't seem helpful for us to add yet another decision point to the mix.

For these reasons, we don't have inline commenting in the What's New list in Windows Live. This isn't to say this is an irreversible decision. It has been pointed out that for feed items that don't have their own comment threads (e.g. status messages) it might be useful to have inline commenting. In addition, I'm sure there are some people who believe that the benefits of inline commenting outweigh the drawbacks that we've mentioned above. I'd love to hear what users of Windows Live think about the above decision and thought process behind it. Let me know in the comments. 

PS: If you are interested in more behind the scenes looks at some of the big and small decisions around the What's New feature in Windows Live, you should read Rob Dolin's ongoing series of posts entitled Series: What New in Windows Live “What’s New” and Why.

Note Now Playing: Guns N' Roses - Chinese Democracy Note


 

Categories: Social Software | Windows Live