I’ve been busy with work and spending time with my son so I haven’t been as diligent as I should be with the blog. Until my time management skills get better, here are some thoughts from a guest post I wrote recently for the Live Services blog.

Dare Obasanjo here, from the Live Services Program Management team. I'd like to talk a bit about the work we are doing to increase interoperability across the "Social Web."

The term The Social Web has been increasingly used to describe the rise of the Web as a way for people to interact, communicate and share with each other using the World Wide Web. Experiences that were once solitary such as reading news or browsing one's photo albums have now been made more social by sites such as Digg and Flickr. With so many Web sites offering social functionality, it has become increasingly important for people to be able to not only be able to connect and share with their friends on a single Web site but also to take these relationships and activities with them wherever they go on the Web.

With the recent update to Windows Live, we are continuing with the vision of enabling our 500 million customers to share and connect with the people they care about regardless of what services they use. Our customers can now invite their contacts from MySpace (the largest U.S. social networking site) and Hi5 to join them on Windows Live in a safe manner without having to resort to using the the password anti-pattern. These sites join Facebook and LinkedIn as social networks from which people can import their social graph or friend list into Windows Live.

image

In addition to interoperating with social networks to bridge relationships across the Web, we are also always working on enabling customers to share the content they are find interesting or activities they are participating in from all over the Web with their friends who use Windows Live services like Hotmail and Messenger. Customers of Windows Live can now add activities from over thirty different online services to their Windows Live profile including social networking sites like Facebook, photo sharing sites like Smugmug & Photobucket, social music sites like last.fm & Pandora, social bookmarking sites like Digg & Stumbleupon and much more.

We are also happy to announce today that in the coming months, MySpace customers will be able to share activities and updates from MySpace with their Windows Live network.

Below is a screenshot of some of the updates you might find on my profile on Windows Live 

image

These recent announcements bring us one step closer to a Social Web where interoperability is the norm instead of the exception. One of the most exciting things about our recent release is how much of the behind-the-scene integration is done using community driven technologies such as the Atom syndication format, Atom Activity Extensions, OAuth, and Portable Contacts. These community driven technologies are moving to ensure that the Social Web is a web of interconnected and interoperable web sites, not a set of competing walled gardens desperately clutching to customer data in an attempt to invent Lock-In 2.0 

As we look towards the future, I believe that the aforementioned standards around contact exchange, social activity streams and authorization are just the first steps. When we look at all the capabilities across the Web landscape it is clear that there are scenarios that are still completely broken due to lack of interoperability across various social websites. You can expect more from Windows Live when it comes to interoperability and the Social Web.

Just watch this space.

Note Now Playing: Eminem - We Made You Note


 

Categories: Social Software | Windows Live

Joe Gregorio, one of the editors of RFC 5023: The Atom Publishing Protocol, has declared it a failure in his blog post titled The Atom Publishing Protocol is a failure where he writes

The Atom Publishing Protocol is a failure. Now that I've met by blogging-hyperbole-quotient for the day let's talk about standards, protocols, and technology… AtomPub isn't a failure, but it hasn't seen the level of adoption I had hoped to see at this point in its life.

Thick clients, RIAs, were supposed to be a much larger component of your online life. The cliche at the time was, "you can't run Word in a browser". Well, we know how well that's held up. I expect a similar lifetime for today's equivalent cliche, "you can't do photo editing in a browser". The reality is that more and more functionality is moving into the browser and that takes away one of the driving forces for an editing protocol.

Another motivation was the "Editing on the airplane" scenario. The idea was that you wouldn't always be online and when you were offline you couldn't use your browser. The part of this cliche that wasn't put down by Virgin Atlantic and Edge cards was finished off by Gears and DVCS's.

The last motivation was for a common interchange format. The idea was that with a common format you could build up libraries and make it easy to move information around. The 'problem' in this case is that a better format came along in the interim: JSON. JSON, born of Javascript, born of the browser, is the perfect 'data' interchange format, and here I am distinguishing between 'data' interchange and 'document' interchange. If all you want to do is get data from point A to B then JSON is a much easier format to generate and consume as it maps directly into data structures, as opposed to a document oriented format like Atom, which has to be mapped manually into data structures and that mapping will be different from library to library.

The Atom effort rose up around the set of scenarios related to blogging based applications at the turn of the decade; RSS readers and blog editing tools. The Atom syndication format was supposed to be a boon for RSS readers while the Atom Publishing Protocol was intended to make thing better for blog editing tools. There was also the expectation that the format and protocol were general enough that they would standardize all microcontent syndication and publishing scenarios.

The Atom syndication format has been as successful or perhaps even more successful than originally intended because it's original scenarios are still fairly relevant on today's Web. Reading blogs using feed readers like Google Reader, Outlook and RSS Bandit is still just as relevant today as it was six or seven years ago. Secondly, interesting new ways to consume feeds have sprung in the form of social aggregation via sites such as FriendFeed. Also since the Atom format is actually a generic format for syndicating microcontent, it has proved useful as new classes of microcontent have shown up on the Web such as streams of status updates and social network news feeds. Thanks to the Atom syndication format's extensibility it is being applied to these new scenarios in effective ways via community efforts such as ActivityStrea.ms and OpenSocial.

On the other hand, Joe is right that the Atom Publishing Protocol hasn't fared as well with the times. Today, editing a blog post via a Web-based blog editing tool like the edit box on a site like Blogger is a significantly richer experience than it was at the turn of the century. The addition of features such as automatically saving draft posts has also worked towards obviating some of the claimed benefits of desktop applications for editing blogs. For example, even though I use Windows Live Writer to edit my blog I haven't been able to convince my wife to switch to using it for editing her blog because she finds the Web "edit box" experience to be good enough. The double whammy comes from the fact that although new forms of microcontent have shown up which do encourage the existence of desktop tools (e.g. there are almost a hundred desktop Twitter apps and the list of Facebook desktop applications is growing rapidly), the services which provide these content types have shunned AtomPub and embraced JSON as the way to expose APIs for rich clients to interact with their content. The primary reason for this is that JSON works well as a protocol for both browser based client apps and desktop apps since it is more compatible with object oriented programming models and the browser security model versus an XML-based document-centric data format.

In my opinion, the growth in popularity of object-centric JSON over document-centric XML as the way to expose APIs on the Web has been the real stake in the heart for the Atom Publishing Protocol.

FURTHER READING

  • JSON vs. XML: Browser Programming Models – Dare Obasanjo on why JSON is more attractive than XML for Web-based mashups due to presenting a friendlier programming model for the browser. 

  • JSON vs. XML: Browser Security Model -   Dare Obasanjo on why JSON is more attractive than XML for Web-based mashups due to circumventing some of the browser security constraints placed on XML.

Note Now Playing: Lady Gaga - Poker Face Note


 

April 13, 2009
@ 12:14 AM

Note Now Playing: Jay-Z - A Dream Note


 

Categories: Personal

Todd Hoff over on the high scalability blog has an interesting post along the vein of my favorite mantra of Disk is the new tape titled Are Cloud Based Memory Architectures the Next Big Thing? where he writes

Why might cloud based memory architectures be the next big thing? For now we'll just address the memory based architecture part of the question, the cloud component is covered a little later.

Behold the power of keeping data in memory:


Google query results are now served in under an astonishingly fast 200ms, down from 1000ms in the olden days. The vast majority of this great performance improvement is due to holding indexes completely in memory. Thousands of machines process each query in order to make search results appear nearly instantaneously.

This text was adapted from notes on Google Fellow Jeff Dean keynote speech at WSDM 2009.

Google isn't the only one getting a performance bang from moving data into memory. Both LinkedIn and Digg keep the graph of their network social network in memory. Facebook has northwards of 800 memcached servers creating a reservoir of 28 terabytes of memory enabling a 99% cache hit rate. Even little guys can handle 100s of millions of events per day by using memory instead of disk.

The entire post is sort of confusing since it seems to mix ideas that should be two or three different blog posts into a single entry. Of the many ideas thrown around in the post, the one I find most interesting is highlighting the trend of treating in-memory storage as a core part of how a system functions not just as an optimization that keeps you from having to go to disk.

The LinkedIn architecture is a great example of this trend. They have servers which they call The Cloud whose job is to cache the site's entire social graph in memory and then have created multiple instances of this cached social graph. Going to disk to satisfy social graph related queries which can require touching data for hundreds to thousands of users is simply never an option. This is different from how you would traditionally treat a caching layer such as ASP.NET caching or typical usage of memcached.

To build such a memory based architecture there are a number of features you need to consider that don't come out of the box in caching product like memcached. The first is data redundancy which is unsupported in memcached. There are forty instances of LinkedIn's in-memory social graph which have to be kept mostly in sync without putting to much pressure on underlying databases. Another feature common to such memory based architectures that you won't find in memcached is transparent support for failover. When your data is spread out across multiple servers, losing a server should not mean that an entire server's worth of data is no longer being served out of the cache. This is especially of concern when you have a decent sized server cloud because it should be expected that servers come and go out of rotation all the time given hardware failures. Memcached users can solve this problem by using libraries that support consistent hashing (my preference) or by keeping a server available as a hot spare with the same IP address as the downed server. The key problem with lack of native failover support is that there is no way to automatically rebalance the workload on the pool of servers are added and removed from the cloud.

For large scale Web applications, if you're going to disk to serve data in your primary scenarios then you're probably doing something wrong. The path most scaling stories take is that people start with a database, partition it and then move to a heavily cached based approach when they start hitting the limits of a disk based system. This is now common knowledge in the pantheon of Web development. What doesn't yet seem to be part of the communal knowledge is the leap from being a cache-based with the option to fall back to a DB to simply being memory-based. The shift is slight but fundamental. 

FURTHER READING

  • Improving Running Components at Twitter: Evan Weaver's presentation on how Twitter improved their performance by two orders of magnitude by increasing their usage of caches and improving their caching policies among other changes. Favorite quote, "Everything runs from memory in Web 2.0".

  • Defining a data grid: A description of some of the features that an in-memory architecture needs beyond simply having data cached in memory.

Note Now Playing: Ludacris - Last Of A Dying Breed [Ft. Lil Wayne] Note


 

Categories: Web Development

Last week, Digg announced the launch of the DiggBar which manages to combine two trends that Web geeks can't stand. It is both a URL shortener (whose problems are captured in the excellent post by Joshua Schachter on URL shorteners) and brings back the trend of one website putting another's content in a frame (which has detailed criticism in the wikipedia article on framing on the World Wide Web). 

The increasing popularity of URL shortening services has been fueled by the growth of Twitter. Twitter has a 140 character limit on posts on the site which means users sharing links on the site often have to find some way of shortening URLs to make their content fit within the limit. From my perspective, this is really a problem that Twitter should fix given the amount of collateral damage the growth of these services may end up placing on the Web.

Some Web developers believe this problem can be solved by the judicious use of microformats. One such developer is Chris Shiflett who has written a post entitled Save the Internet with rev="canonical" which states the following

There's a new proposal ("URL shortening that doesn't hurt the Internet") floating around for using rev="canonical" to help put a stop to the URL-shortening madness. It sounds like a pretty good idea, and based on some discussions on IRC this morning, I think a more thorough explanation would be helpful. I'm going to try.

This is easiest to explain with an example. I have an article about CSRF located at the following URL:

http://shiflett.org/articles/cross-site-request-forgeries

I happen to think this URL is beautiful. :-) Unfortunately, it is sure to get mangled into some garbage URL if you try to talk about it on Twitter, because it's not very short. I really hate when that happens. What can I do?

If rev="canonical" gains momentum and support, I can offer my own short URL for people who need one. Perhaps I decide the following is an acceptable alternative:

http://shiflett.org/csrf

Here are some clear advantages this URL has over any TinyURL.com replacement:

  • The URL is mine. If it goes away, it's my fault. (Ma.gnolia reminds us of the potential for data loss when relying on third parties.)
  • The URL has meaning. Both the domain (shiflett.org) and the path (csrf) are meaningful.
  • Because the URL has meaning, visitors who click the link know where they're going.
  • I can search for links to my content; they're not hidden behind an indefinite number of short URLs.

There are other advantages, but these are the few I can think of quickly.

Let's try to walk through how this is expected to work. I type in a long URL like http://www.25hoursaday.com/weblog/2009/03/22/VideoStandardsForAggregatingActivityFeedsAndSocialAggregationServicesAtMIX09Conference.aspx into Twitter. Twitter allows me to post the URL and then crawls the site to see if it has a link tag with a rev="canonical" attribute. It finds one and then replaces the short URL with something like http://www.25hoursaday.com/weblog/MIX09talk which is the alternate short URL I've created for my talk. What could go wrong? Smile

So for this to solve the problem, every site that could potentially be linked to from Twitter (i.e. every website in the world) needs to run their own URL shortening service. Then Twitter needs to make sure to crawl the website behind every URL in every tweet that flows through the system.  Oh yeah, and the fact is that the URLs still aren't as efficient as those created by sites like http://tr.im unless everyone buys a short domain name as well.

Sounds like a lot of stars have to align to make this useful to the general populace and not just a hack that is implemented by a couple dozen web geeks.  

Note Now Playing: Nas - Hate Me Now (feat. Puff Daddy) Note


 

Categories: Web Development