Tuesday, 02 December 2008 - Dare Obasanjo's weblog

December 1, 2008

@ 01:58 PM

Over the weekend, Tim O'Reilly wrote a post Why I Love Twitter where he talks about some of the things he finds compelling about Twitter. Here's my list

Thanks to APIs, Everyone Experiences the Service Differently: Great social software fits itself into the lifestyle and personality of its users instead of the other way around. Whenever I talk to Twitter users I am surprised to learn how differently they use the service. For example, I primarily read and write to Twitter from a Vista sidebar gadget (Twadget) which to many of my coworkers seems weird. Every time I talk to a coworker, I seem to learn a new way of using Twitter from desktop clients like Twhirl and Twitterrific to consuming it on your mobile phone via SMS or a dedicated app like TinyTwitter. Then there are people whose main interface to Twitter is other Web sites either via widgets such as the Facebook Twitter application or aggregators like FriendFeed. And so on...

The best experience is when you start chaining some of these tools together. I became sold on Twitter once I realized it gave me a simple way to provide status updates to my social network on Facebook (and soon on Windows Live) right from my desktop or my favorite RSS reader.
Real Reactions to Real News in Real Time: I haven't found a better way than http://search.twitter.com to read people's reactions to breaking news as it unfolds. I used to think that blog search engines like Technorati and Google Blog Search were the best way to keep on top of the Web in real time but Twitter has put them all to shame. It's no surprise that even CNN is now pointing out Twitter's ability to capture the Zeitgeist in articles like Tweeting the terror: How social media reacted to Mumbai.
Protected Tweets for the Privacy Conscious: One thing I've found surprising is that there a large number of people who don't use Twitter as the cutting room floor for their blog like Tim O'Reilly or myself do. I know a bunch of people who follow a dozen or so of their close friends and use Twitter as a way to keep them updated on their daily lives and organize lunch/dinner/drinks. These people often have their tweets protected so only their friends can see them. This one feature makes Twitter less about micro-blogging and more about micro-social networking in my mind's eye.

Now it's your turn. Why do you love Twitter?

Note Now Playing: Kanye West - Amazing (feat. Young Jeezy) Note

Categories: Social Software

December 1, 2008

@ 01:57 PM

Comments [1]

Facebook Connect: Does Issuing Passports Make Facebook A Country?

My RSS reader is buzzing with a lot hype around Facebook's Connect this morning. The lead story seems to be the New York Times article entitled Facebook Aims to Extend Its Reach Across the Web which announces that a number of popular sites are about to adopt the technology. The article is excerpted below

Facebook Connect, as the company’s new feature is called, allows its members to log onto other Web sites using their Facebook identification and see their friends’ activities on those sites. Like Beacon, the controversial advertising program that Facebook introduced and then withdrew last year after it raised a hullabaloo over privacy, Connect also gives members the opportunity to broadcast their actions on those sites to their friends on Facebook.

In the next few weeks, a number of prominent Web sites will weave this service into their pages, including those of the Discovery Channel and The San Francisco Chronicle, the social news site Digg, the genealogy network Geni and the online video hub Hulu.
…
MySpace, Yahoo and Google have all announced similar programs this year, using common standards that will allow other Web sites to reduce the work needed to embrace each identity system. Facebook, which is using its own data-sharing technology, is slightly ahead of its rivals.

This set of partners is definitely higher profile than the last list of Facebook Connect adopters and yet I still have to wonder how this is eventually going to shake out. Even with this set of partners there are still two big hurdles Facebook has to surmount. The first is just getting users to connect their identities on different sites with their Facebook identity. Just having the ability to connect a Digg account and a Facebook account doesn't mean users will adopt the feature. I assume this is why the Facebook Beacon automatically linked a user's account on a partner site to their Facebook account in the first place. How this is presented to users on participating sites will be key to its adoption and this is mostly out of Facebook's control.

The other challenge that Facebook Connect will face is how to prevent it from tarred with the same "centralized identity service" brush that Microsoft's Passport got tarred with at the turn of the century. Back in the year 2000, Joel Spolsky wrote Does Issuing Passports Make Microsoft a Country? which began as follows

Am I the only one who is terrified about Microsoft Passport? It seems to me like a fairly blatant attempt to build the world's largest, richest consumer database, and then make fabulous profits mining it. It's a terrifying threat to everyone's personal privacy and it will make today's "cookies" seem positively tame by comparison. The scariest thing is that Microsoft is advertising Passport as if it were a benefit to consumers, and people seem to be falling for it! By the time you've read this article, I can guarantee that I'll scare you into turning off your Hotmail account and staying away from MSN web sites.

These sentiments never went away and by 2005 Microsoft had lost some of its most prominent Passport partner sites. The service has since been rebranded Windows Live ID and is now primarily a unified identity system for Microsoft sites as opposed to being a single-sign on service for the entire Web. It may be that Microsoft was ahead of its time (as I've argued it was with Hailstorm and other initiatives) but the arguments against centralized identity systems have seemed pretty convincing in the past. In addition, I suspect that developers will start asking questions when they realize that they have to support one proprietary technology for Facebook Connect, something different for MySpace Data Availability and yet another for Google Friend Connect. How many sign-in buttons will end up adorning these sites? http://beta.citysearch.com already has two sign-in links, will that expand to four as they move to support signing in with your MySpace account and your Google Friend Connect-enabled account? Or will services decide to pick a social network to side with to the exclusion of all others? It's beginning to remind me of the high definition DVD format wars and not in a good way.

Interesting times indeed.

Note Now Playing: Kanye West - Welcome To Heartbreak (feat. Kid Cudi) Note

Categories: Competitors/Web Companies | Social Software

November 26, 2008

@ 05:22 AM

Comments [0]

A Comparison of Amazon, Google and Microsoft's Cloud Computing Offerings

Scott Watermasysk has a great high-level comparison of the cloud computing offerings from Amazon, Google and Microsoft in his post Cloud Options - Amazon, Google, & Microsoft. Below are some excerpts from his review

Amazon (AWS)

Most mature offering of the three.

Google (AppEngine)

I get the sense that Google is trying to appeal to a small and focused audience (well, as small as Google can). There is nothing wrong with this approach, but I think long term I would feel handcuffed on their platform.

Microsoft (Windows Azure)

Microsoft still has a lot of "execution" to complete, but overall I am thoroughly impressed with the total breadth of their offering.

If you are interested in this space you should read Scott's entire post. I was thinking of doing a similar comparison but Scott's post hits the highs and lows of each service. I completely agree with his analysis, Amazon provides a mature offering but I balk at the complexity of managing and deploying my own VM images. Google's offering seems incomplete and it is bothersome that they do not provide any Web services (SOAP or REST). Microsoft has an ambitious offering which combines the ease of use of Google's offering with a more complete set of services but the proof will be in the pudding since it isn't yet broadly available.

This is an excellent review by Scott Watermasysk and is definitely worth sharing.

Note Now Playing: Trey Songz - Can't Help But Wait (Remix) feat. Jay Read Note

Categories: Mindless Link Propagation

November 24, 2008

@ 07:00 AM

Comments [1]

Alleged Twitter Security Issue: Doctor It Hurts When I Do That…

…then don't do that.

I was somewhat amused by the following description of a "security flaw" in Twitter today from Brian Shaler in his post Twitter Security Issue where he writes

I recently discovered a serious security issue on Twitter. Let me tell you the story.
…
This is where it gets SERIOUS
Let’s imagine, hypothetically, that you give your password to a 3rd party application. If the application’s owner uses that password once and saves the session cookie, they can store the session cookie and re-create it at any time in the future even if you change your password (There are even browser plug-ins that allow you to read and write cookies).

This means they can get back into your account whenever they want, indefinitely. They can post tweets, read your DMs, follow other users on your behalf, etc.
…
How to stay safe
As far as I know, there is nothing you can do to prevent this from happening to you, aside from never giving anyone or any application your password.

Twitter needs to use a smarter session cookie that is in some way linked to the user’s password or have another way of killing other sessions if you log out. Twitter should also consider using per-user API keys for users to give to 3rd party applications, instead of authenticating with your password.

This is one of those posts I both agree and disagree with at the same time. I agree that the underlying problem is that Twitter encourages the the password anti-pattern with their APIs. Today, the Twitter API only supports HTTP Basic authentication which means that applications are expected to collect people's usernames and passwords if they want to interact with the API.

The problem with Twitter's approach is called out in Brian Shaler's blog post. It means every application that accesses a user's Twitter account on their behalf gets the keys to the kingdom in a non-revocable way, unless the user changes their password AND Twitter comes up with some scheme where they invalidate all session cookies that were authenticated with the old password. However this is a hack. The proper solution is for applications to not require a user's credentials to access their data or perform actions on their behalf.

There are many services that have implemented such solutions today including Google AuthSub, Yahoo! BBAuth, Windows Live DelAuth, AOL OpenAuth, the Flickr Authentication API, the Facebook Authentication API and others. There is also OAuth which is an attempt to create a standard protocol for delegating authority to an application so that apps don't have to learn a different scheme for each Web site they access.

So the bug isn't that Twitter doesn't have checks in place to invalidate session cookies after passwords have been changed (which is a good idea for defense in depth) but instead that Twitter encourages its users to hand out their credentials to any application that asks for them in the first place. The bottom line is that once you give your password to another entity, all bets are off. So don't do that.

PS: I'm rather stunned that Twitter plans to continue this practice indefinitely given the following excerpt from their developer wiki

At the time of writing, the Twitter engineers are working on an additional authentication scheme similar to Google’s AuthSub or Flickr’s API Authentication. This will have the added benefit of registering third-party applications to better promote them on the site. The Development Talk group will be notified when this authentication scheme is ready for testing. Note that this new authentication scheme will be optional, not mandatory, once available.

Note Now Playing: John Legend - Can't Be My Lover Note

Categories: Web Development

November 19, 2008

@ 01:20 PM

Comments [2]

Some Thoughts on Facebook Connect and CitySearch

Early this morning I found out that the new CitySearch beta site uses Facebook Connect to allow Facebook users to sign-in to the site and bring their social network with them to CitySearch. Below is a screenshot of the sign-in experience

a user can choose sign-in with their Facebook credentials and if so they can bring their social network and profile info from Facebook to CitySearch and there is the option that some of their activities on CitySearch are republished to Facebook.

Here's what it looks like to view my friends page on CitySearch after signing in with my Facebook credentials

From the screenshot, I have five Facebook friends who have also associated their Facebook profile with CitySearch.

A feature like Facebook Connect is something I've always wanted to see implemented but I've always gotten hung up on the details. The first stumbling block is the chicken and egg problem. Connecting your identity on multiple sites is only valuable if all my friends are doing it as well. From the above screenshot only 5 out of my 472 have linked their accounts and three of them are Facebook employees who work on Facebook Connect. Despite that, I think Facebook could incentivize users to link their accounts if it improves their user experience. For example, CitySearch isn't currently picking up that I live in Seattle even though it pulled in my Facebook profile information.

Another problem I saw with this sort of technology is that sites would be reluctant to tie their user authentication to another site (echoes of the Microsoft Passport is Evil fallout from the turn of the century) especially given all the hype behind decentralized identity solutions like OpenID. I believe Facebook has gotten around this concern in a couple of ways. The first is that Facebook Connect provides concrete benefits of access to a user's mini-feed and thus prime real estate on the user's Facebook profile. Secondly, it allows sites to bootstrap their social networks off of one of the must successful social networking sites in the world. Finally, Facebook's brand doesn't have the "baggage" that other companies that have tried to play in this place have had to struggle against.

Kudos to my friend Mike Vernal and all the other folks at Facebook who worked on Connect. This is just plain brilliant. Much love.

Note Now Playing: The Game - Nice Note

Categories: Social Software

November 17, 2008

@ 05:00 AM

Comments [10]

Live Framework (LiveFX), Is it Microsoft's GData or Something More?

Disclaimer: What follows are my personal impressions from investigating the community technology preview version of the Live Framwork (LiveFX). It is not meant to be an official description of the project from Microsoft, you can find that here.

At Microsoft's recent Professional Developer Conference, a new set of Web services called Live Framework (aka LiveFX) was unveiled. As I've spent the past year working on platforms for user experiences in Windows Live, I actually haven't been keeping up to date with what's going on in developer API land when it comes to programming against our services. So I decided to check out the Live Framework website and find out exactly what was announced.

What is it?

Although the main website is somewhat light on details, I eventually gleaned enough information from the Live Framework SDK documentation on MSDN to conclude that LiveFX consists of the following pieces

A Resource Model: A set of RESTful APIs for interacting with Live Mesh and Windows Live data.
Libraries: A set of libraries for the .NET Framework, Silverlight and Javascript for accessing the REST APIs.
The Live Operating Environment: A local web server that implements #1 above so it can be programmed against using #2 above.

The Scary Architecture Diagram

This diagram tries to capture all the ideas in LiveFX in a single image. I found it somewhat overwhelming and after learning more about LiveFX I consider it to be a rather poor way of conveying across its key concepts. It doesn't help that this diagram is somewhat aspirational given that some key pieces of the diagram are missing from the current technology preview.

The Resource Model

Live Mesh and Windows Live data is exposed by LiveFX as a set of resources identified by URIs which can be interacted with via the Atom Publishing Protocol (RFC 5023). Relationships between resources are exposed as hyperlinks between resources. The hierarchical data model currently exposed in the CTP is shown in the diagram below taken from the MSDN documentation.

It should be noted that although AtomPub is the primary protocol for interacting with resources in LiveFX, multiple serialization formats can be used to retrieve data from the service including Atom, RSS, Plain Old XML (POX), JSON or even binary XML.

Since LiveFX is a fully compliant implementation of the Atom Publishing Protocol, one can browse to the service document of a user's Mesh or other top level resource and traverse links to various Atom collections and feeds in the hierarchy. Below is a screenshot of the LiveFX resource browser showing the service document for a user's Mesh with the links to various child collections exposed as hyperlinks.

Besides supporting multiple serialization formats, there are a number of other features of LiveFX that separate it from a vanilla implementation of the Atom Publishing Protocol.

Synchronization via FeedSync: Mesh resources in LiveFX can be synchronized using FeedSync (formerly Simple Sharing Extensions). FeedSync is a family of extensions to RSS and Atom that enables bidirectional synchronization of XML feeds and the resources they reference. However synchronization in LiveFX is based on a client/server model instead of a peer-to-peer model which means that instead of the server subscribing to changes from the client and vice versa, clients subscribe to changes from the server and then inform the server when they make a change. More information about how LiveFX implements FeedSync can be found here.
Query Model and Linq to REST: LiveFX supports the same URI query parameters for paging, sorting, filtering and inline expansion of linked resources as other Microsoft cloud-based data APIs including ADO.NET Data Services (formerly Astoria), SQL Server Data Services and Windows Azure Storage services. One of the benefits of this consistency is that the ADO.NET client libraries can be used to perform Linq queries over the LiveFX data types using a technology that has been colloquially described as Linq to REST. For example, the following C# Linq query actually executes HTTP GET requests using the $filter parameter under the covers.
```
MeshObject GetMeshObjectByTitle(string title)
       {
           MeshObject meshObject = (from mo in mesh.CreateQuery<MeshObject>()
                                    where mo.Resource.Title == title
                                    select mo).FirstOrDefault<MeshObject>();

           return meshObject;
       }
```
The HTTP request this makes over the wire is

GET https://user-ctp.windows.net/V0.1/Mesh/MeshObjects/{meshObjectID}/DataFeeds/?$filter=(Title eq ‘WhateverWasPassedIn’)
Batch Requests via Resource Scripts: LiveFX supports batching using a construct known as resource scripts. Using resource scripts a developer can submit a single request which contains multiple create, retrieve, update and delete operations at once. A resource script consists of a control flow statement which can contain one or more control flow statements, web operation statements, synchronization statements, data flow constructs and data flow statements. You can find out more about resource scripts by reading the document About Live Framework Resource Scripts on the LiveFX developer site.
Resource Introspection via OPTIONS: One problem that commonly occurs in REST APIs is determining which operations a resource supports. Some protocols like OpenSocial specify a mechanism where HTTP responses should indicate which parameters are not supported. The problem with this approach is that the client has to first make an HTTP request then have it fail before determining if the feature is supported. LiveFX supports the HTTP OPTIONS verb on every resource. By performing the following HTTP request
OPTIONS https:// user-ctp.windows.net/V0.1/{UserID}/Mesh/MeshObjects/{meshObjectID}
a client can retrieve an introspection metadata document which describes what query parameters and serialization formats the resource supports. The $metadata query parameter can also be used to retrieve the introspection metadata document. This enables clients using libraries that don't support making HTTP OPTIONS requests to also be able to retrieve introspection metadata.
Support for Portable Contacts: By specifying the parameter $type=portable when requesting contact data, the results will be returned in the Portable Contacts schema format as either JSON or Plain Old XML.

The Libraries

Like most major Web vendors who have exposed REST APIs, Microsoft has provided client libraries to make interacting with the LiveFX service more natural for developers who aren't comfortable programming directly against HTTP. The following client libraries are provided

A generic AtomPub client library for the .NET Framework. Learn more about programming with it here.
A .NET Framework library which provides a high-level object model for interacting with the LiveFX service. More details here.
A Javascript library which provides a high-level object model for interacting with the LiveFX service. Programming samples can be found here.

The Live Operating Environment

The Live Operating Environment refers to two things. The first is the Web platform upon which the LiveFX REST APIs are implemented. The second is a local cache which runs on your PC or other device which exposes the same REST interface as the LiveFX Web service. This is somewhat similar to Google Gears except that the database is accessed RESTfully instead of via a SQL API.

The intent of the local version of the Live Operating Environment is to enable developers to be able to build apps that target the desktop or the Web without having to change their programming model. All that needs to be altered is changing the base URI from https://user-ctp.windows.net to http://localhost:2048 when accessing LiveFX resources. Everything else works exactly the same.

The Bottom Line

As the title of this blog post states there is a lot of similarity in concept between LiveFX and Google's Data APIs (GData). Like GData, LiveFX provides a consistent set of AtomPub based APIs for accessing resources from a particular vendor's online services along with a set of client libraries that wrap these RESTful APIs. And just like GData, there are Microsoft-specific extensions to the Atom syndication format and custom query parameters for sorting, filtering and paging. LiveFX also supports batching like GData, however from my perspective adding batching to a Web service seems like an attempt to reinvent distributed transactions. This is a bad idea given the flaws of distributed transactions that are well discussed in Pat Helland's excellent paper Life beyond Distributed Transactions: An Apostate's Opinion.

A number of LiveFX's additional features such as synchronization and resource introspection which have no analog in GData are fairly interesting and I wouldn't be surprised to see these ideas get further traction in the industry. On the flip side, the client-side Live Operating Environment is a technology whose benefits elude me. I admit it is kind of cool but I can't see its utility.

Note Now Playing: John Legend - Green Light (feat. Andre 3000) Note

Categories: Web Development | Windows Live

November 13, 2008

@ 01:30 PM

Comments [2]

Some Thoughts on Walled Gardens and Social Operating Systems

About a year ago I wrote up a definition of a social operating system in my post The Difference between a Social Network Site, a Social Graph Application and a Social OS which I think is worth revisiting today. In that post I defined a Social OS as

Social Operating System: These are a subset of social networking sites. In fact, the only application in this category today is Facebook. Before you use your computer, you have to boot your operating system and every interaction with your PC goes through the OS. However instead of interacting directly with the OS, most of the time you interact with applications written on top of the OS. Similarly a Social OS is the primary application you use for interacting with your social circles on the Web. All your social interactions whether they be hanging out, chatting, playing games, watching movies, listening to music, engaging in private gossip or public conversations occurs within this context. This flexibilty is enabled by the fact that the Social OS is a platform that enables one to build various social graph applications on top of it.

In retrospect, the fundamental flaw with this definition is that it encourages services that want to become social operating systems to aspire to become walled gardens. The problem with walled gardens on the Web is that they shortchange users. This is because the Web is about sharing and communicating with people from all over the world while walled gardens are about limiting you to interacting with people (and content) that are part of a particular online service or Web site. Thus walled gardens limit their users.

Jeremy Zawodny had a great post about this entitled There is no Web Operating System (or WebOS) where he wrote

Luckily, two of my coworkers caught on to what I was saying and managed to help put it into context a bit. First off was Matt McAlister (who runs YDN, the group I work in). In The Business of Network Effects he does a good job of explaining how businesses and services in a network are fundamentally different from those which are isolated islands.

Recalling a brief conversation we had a couple weeks ago, he says:

Jeremy Zawodny shed light on this concept for me using building construction analogies.

He noted that my building contractor doesn't exclusively buy Makita or DeWalt or Ryobi tools, though some tools make more sense in bundles. He buys the tool that is best for the job and what he needs.

My contractor doesn't employ plumbers, roofers and electricians himself. Rather he maintains a network of favorite providers who will serve different needs on different jobs.

He provides value to me as an experienced distribution and aggregation point, but I am not exclusively tied to using him for everything I want to do with my house, either.

Similarly, the Internet market is a network of services. The trick to understanding what the business model looks like is figuring out how to open and connect services in ways that add value to the business.

Bingo.

The web is a marketplace of services, just like the "real world" is. Everyone is free to choose from all the available services when building or doing whatever it is they do. The web just happens to be a far more efficient marketplace than the real world for many things. And it happens to run on computers that each need an operating system.

But nobody ever talks about a "Wall Street Operating System" or a "Small Business Operating System" do they? Why not?

Ian Kennedy followed up to Matt's post with The Web as a Loose Federation of Contractors in which he says:

I like Jeremy's illustration - an OS gives you the impression of an integrated stack which leads to strategies which favor things like user lock-in to guarantee performance and consistency of experience. If you think of the web as a loose collections of services that work together on discreet projects, then you start to think of value in other ways such as making your meta-data as portable and accessible as possible so it can be accessed over and over again in many different contexts.

Bingo again.

No matter how popular a particular website becomes it will not be the only service used by its customers. So it follows that no matter how popular a social networking site becomes, it will not be the only social networking service used by its customers or their friends. Thus a true Social Operating System shouldn't be about creating a prettier walled garden than your competitors but instead about making sure you can bring together all of a user's social experiences together regardless of whether they are on your site or on those of a competing service. If I use Twitter and my wife doesn't, I'd like her to know what I'm doing via the service even though she isn't a Twitter user. If my friends use Yelp to recommend restaurants in the area, I'd like to find out about the restaurants even though I'm not a Yelp user. And so on.

With the latest release of Windows Live, we're working towards bringing this vision one step closer to reality. You can read more about it in the official announcement and the accompanying blog post. I guess the statement "life without walls" also applies to Windows Live. Wink

Note Now Playing: T-Pain - Chopped N Skrewed (Feat. Ludacris) Note

Categories: Social Software | Windows Live

November 13, 2008

@ 01:06 PM

Comments [0]

Coming Soon: Updated Windows Live Online Services

From the Microsoft press release Microsoft Introduces Updated Windows Live Service we learn

REDMOND, Wash. — Nov. 12, 2008 — Microsoft Corp. today announced the next generation of Windows Live, an integrated set of online services that make it easier and more fun for consumers to communicate and share with the people they care about most. The new generation of Windows Live includes updated experiences for photo sharing, e-mail, instant messaging, as well as integration with multiple third-party sites. The release also includes Windows Live Essentials, free downloadable software that enhances consumers’ Windows experience by helping them simplify and enjoy digital content scattered across their PC, phone and on Web sites. For more information about windows live go to http://www.windowslive.com.

Consumers today are creating online content and sharing it in many places across the Web. To help make is simple for the more than 460 million Windows Live customers to keep their friends up to date, Microsoft is collaborating with leading companies including Flickr, LinkedIn Corp., Pandora Media Inc., Photobucket Inc., Twitter, WordPress and Yelp Inc. to integrate activities on third-party sites into Windows Live through a new profile and What’s New feed. The new Windows Live also gives consumers the added convenience of having a central place to organize and manage information.

It's really exciting to know that hundreds of millions of people will soon be able to take advantage of what we've been thinking about and working on over the past year. I plan to hold my tongue until everyone can play with the new version themselves. For now I'll leave some links and screenshots from various blogs showing our baby off.

From TechCrunch: Sweeping Changes At Live.com: It’s A Social Network!

From Windows Live Wire:Windows Live – Keeping your life in sync

From Mary Jo Foley on All About Microsoft: Windows Live Wave 3: Microsoft’s kinder and simpler consumer services strategy?

From Kara Swisher on BoomTown: Microsoft Officially Facebooks, Oops, Socializes Windows Live Internet Services

Note Now Playing: Terence Trent D'Arby - Sign Your Name Note

Categories: Windows Live

November 9, 2008

@ 12:46 PM

Comments [7]

In-Memory Caching: Why We Can't Just Trust the Database to get it Right

I remember taking an operating systems class in college and marveling at the fact that operating system design seemed less about elegant engineering and more about [what I viewed at the time as] performance hacks. I saw a similar sentiment recently captured by Eric Florenzano in his post It's Caches All the Way Down where he starts describing how a computer works to a friend and ends up talking about the various layers of caching from CPU registers to L2 caches to RAM and so on.

At the end of his post Eric Florenzano asks the following question I've often heard at work and in developer forums like programming.reddit

That's what struck me. When you come down to it, computers are just a waterfall of different caches, with systems that determine what piece of data goes where and when. For the most part, in user space, we don't care about much of that either. When writing a web application or even a desktop application, we don't much care whether a bit is in the register, the L1 or L2 cache, RAM, or if it's being swapped to disk. We just care that whatever system is managing that data, it's doing the best it can to make our application fast.

But then another thing struck me. Most web developers DO have to worry about the cache. We do it every day. Whether you're using memcached or velocity or some other caching system, everyone's manually moving data from the database to a cache layer, and back again. Sometimes we do clever things to make sure this cached data is correct, but sometimes we do some braindead things. We certainly manage the cache keys ourselves and we come up with systems again and again to do the same thing.

Does this strike anyone else as inconsistent? For practically every cache layer down the chain, a system (whose algorithms are usually the result of years and years of work and testing) automatically determines how to best utilize that cache, yet we do not yet have a good system for doing that with databases. Why do you think that is? Is it truly one of the two hard problems of computer science? Is it impossible to do this automatically? I honestly don't have the answers, but I'm interested if you do.

Eric is simultaneously correct and incorrect in his statements around caching and database layers. Every modern database system has caching mechanisms that are transparent to the developer. For LAMP developers there is the MySQL Query Cache which transparently caches the text of SELECT query and the results so that the next time the query is performed it is fetched from memory. For WISC developers there are the SQL Server's Data and Procedure caches which store query plans and their results to prevent having to repeatedly perform expensive computations or go to disk to fetch recently retrieved data. As with everything in programming, developers can eke more value out of these caches by knowing how they work. For example, using parameterized queries or stored procedures significantly reduces the size of the procedure cache in SQL Server. Tony Rogerson wrote an excellent post where he showed how switching from SQL queries based on string concatenation to parameterized queries can reduce the size of a procedure cache from over 1 Gigabyte to less than 1 Megabyte. This is similar to understanding how garbage collection or memory allocation works teaches developers to favor recycling objects instead of creating new ones and favoring arrays over linked lists to reduce memory fragmentation.

Even though there are caching mechanisms built into relational databases that are transparent to the developer, they typically aren't sufficient for high performance applications with significant amounts of read load. The biggest problem is hardware limitations. A database server will typically have twenty to fifty times more hard drive storage capacity than it has memory. Optimistically this means a database server can cache about 5% to 10% of its entire data in memory before having to go to disk. In comparison, lets look at some statistics from a popular social networking site (i.e. Facebook)

48% of active users log-in daily
49.9 average pages a day viewed per unique user

How much churn do you think the database cache goes through if half of the entire user base is making data requests every day? This explains why Facebook has over 400 memcached hosts storing over 5 Terabytes of data in-memory instead of simply relying on the query caching features in MySQL. This same consideration applies for the majority of social media and social networking sites on the Web today.

So the problem isn't a lack of transparent caching functionality in relational databases today. The problem is the significant differences in the storage and I/O capacity of memory versus disk in situations where a large percentage of the data set needs to be retrieved regularly. In such cases, you want to serve as much data from memory as possible which means going beyond the physical memory capacity of your database server and investing in caching servers.

Note Now Playing: Kanye West - Two Words (feat. Mos Def, Freeway & The Harlem Boys Choir) Note

Categories: Web Development

November 8, 2008

@ 03:53 PM

Comments [6]

C# is the Next Python: Duck Typing and C# 4.0

At the recent Microsoft Professional Developer's Conference there was a presentation by Anders Hejlsberg about The Future of C# which unveiled some of the upcoming features of C# 4.0. Of all the features announced, the one I found most interesting was the introduction of duck typing via a new static type called dynamic.

For those who aren't familiar with Duck Typing, here is the definition from Wikipedia

In computer programming, duck typing is a style of dynamic typing in which an object's current set of methods and properties determines the valid semantics, rather than its inheritance from a particular class. The name of the concept refers to the duck test, attributed to James Whitcomb Riley (see History below), which may be phrased as follows:

If it walks like a duck and quacks like a duck, I would call it a duck.

In duck typing one is concerned with just those aspects of an object that are used, rather than with the type of the object itself. For example, in a non-duck-typed language, one can create a function that takes an object of type Duck and calls that object's walk and quack methods. In a duck-typed language, the equivalent function would take an object of any type and call that object's walk and quack methods. If the object does not have the methods that are called then the function signals a run-time error.

This is perfectly illustrated with the following IronPython program

def PrintAt(container, index):
    print "The value at [%s] is %s" % (index, container[index])

if __name__ == "__main__":
    #create dictionary

    table = {1 : "one", 2 : "two",3 : "three"}
    #create sequence

    list = ("apple", "banana", "cantalope")

    PrintAt(table, 1)
    PrintAt(list, 1)

In the above program the PrintAt() simply requires that the container object passed in supports the index operator '[]' and can accept whatever type is passed in as the index. This means I can pass both a sequence (i.e. a list) or a dictionary (i.e. a hash table) to the function and it returns results even though the semantics of using the index operator is very different for lists and dictionaries.

Proponents of static typing have long argued that features like duck typing lead to hard-to-find bugs which are only detected at runtime after the application has failed instead of during development via compiler errors. However there are many situations even in statically typed programming where the flexibility of duck typing would be beneficial. A common example is invoking JSON or SOAP web services and mapping these structures to objects.

Recently, I had to write some code at work which spoke to a JSON-based Web service and struggled with how to deal with the fact that C# requires me to define the class of an object up front before I can use it in my application. Given the flexible and schema-less nature of JSON, this was a problem. I ended up using the JsonReaderWriterFactory to create an XmlDictionaryReader which maps JSON into an XML document which can then be processed flexibly using XML technologies. Here's what the code looked like

using System;
using System.IO; 
using System.Runtime.Serialization.Json;
using System.Text;
using System.Xml; 

namespace Test
{
    class Program
    {
        static void Main(string[] args)
        {
            string json = @"{ ""firstName"": ""John"", ""lastName"": ""Smith"", ""age"": 21 }";
            var stream = new MemoryStream(ASCIIEncoding.Default.GetBytes(json));
            var reader = JsonReaderWriterFactory.CreateJsonReader(stream, XmlDictionaryReaderQuotas.Max); 

            var doc = new XmlDocument(); 
            doc.Load(reader);
            
            //error handling omitted for brevity
            string firstName = doc.SelectSingleNode("/root/firstName").InnerText; 
            int age          = Int32.Parse(doc.SelectSingleNode("/root/age").InnerText); 
            
            Console.WriteLine("{0} will be {1} next year", firstName, age + 1);
            Console.ReadLine();  
        }
    }
}

It works but the code is definitely not as straightforward as interacting with JSON from Javascript. This is where the dynamic type from C# 4.0 would be very useful. With this type, I could rewrite the above code as follows

using System;
using System.IO; 
using System.Runtime.Serialization.Json;
using System.Text;
using System.Xml; 

namespace Test
{
    class Program
    {
        static void Main(string[] args)
        {
            string json = @"{ ""firstName"": ""John"", ""lastName"": ""Smith"", ""age"": 21 }";
            var stream = new MemoryStream(ASCIIEncoding.Default.GetBytes(json));
            dynamic person = JsonObjectReaderFactory.CreateJsonObjectReader(stream, XmlDictionaryReaderQuotas.Max); 

            Console.WriteLine("{0} will be {1} next year", person.firstName, person.age + 1);
            Console.ReadLine();  
        }
    }
}

In the C# 4.0 version I can declare a person object whose class/type I don't have to define at compile time. Instead the property accesses are converted to dynamic calls to the named properties using reflection by the compiler. So at runtime when my [imaginary] JsonObjectReader dynamically creates the person objects from the input JSON, my application works as expected with a lot less lines of code.

It's amazing how Python-like C# gets each passing year.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Tuesday, 02 December 2008 - Dare Obasanjo's weblog

Amazon (AWS)

Google (AppEngine)

Microsoft (Windows Azure)