Sunday, 28 March 2010 - Dare Obasanjo's weblog

March 15, 2010

@ 04:05 PM

Some Thoughts on Location Based Services Like Gowalla and FourSquare

I was recently on a panel at the South by South West interactive conference (SXSW) where we discussed multiple applications of the real-time Web and the things that might prevent us from seeing its true potential. I’ve found it interesting that the key take away from the panel is that privacy issues will be one of the biggest problems we will face as we move forward. You can see this perspective in CNN’s coverage of the panel in the story Privacy concerns hinder 'real-time Web' creation, developers say and GigaOm’s write-up SXSW: Is Privacy on the Social Web a Technical Problem?

This overlap of privacy and real-time web features is brought into sharp relief when you look at services such as Foursquare and Gowalla which provide a mechanism for people to broadcast their physical location to a group of friends in real-time. I started using Foursquare last week and I’ve noticed I’m even more careful about who I accept friend requests from than on Facebook or Windows Live Messenger. The fact is that I may share status updates and photos with people but it doesn’t mean I want them to be aware of where I am on an up to the minute basis especially if I’m out spending time with my family and friends. This difference in how we view location data from other sorts of real-time data we share is captured by the co-founder of Foursquare in the article Facebook Isn't For Real Life Friends Anymore, Says Foursquare's Dennis Crowley where it states

Facebook plans to clone Foursquare's central service -- the ability for site members to use their phones to "check-in" from restaurants and bars -- and make it a mere Facebook feature.

But Foursquare cofounder Dennis Crowley says there's something Facebook can't clone: the real-life friendships between Foursquare users.

"Facebook used to be who your friends are, now it's everyone," Dennis told us in an interview.

"[Foursquare] is more tightly curated to who you want to have as your check-in friends. Facebook is good place for status updates and sharing photos, not to keep tabs on where people are going."

I think Crowley is on to something when he says Facebook can’t clone the Foursquare relationship model. I suspect that like Twitter, Foursquare has created a social network whose value proposition is differentiated enough from Facebook’s that it can grow into a relatively popular albeit smaller service that will not be “killed” by Facebook*. Secondly, there is a lot of synergy between Foursquare and Facebook as evidenced by the fact that Facebook is the largest referrer of traffic to Foursquare thanks to their implementation of Facebook Connect. So I think the claims that one will kill the other is just the usual tech press creating conflict to generate page views.

One thing I have noticed is that location can’t just be a field you bolt on to a status update. It has to be a key part of the information you are sharing with others otherwise it adds little value to the user experience and in fact may detract from it by adding clutter. For example, compare what a location-based update from Foursquare looks like on Facebook versus what the exact same update looks like on Twitter

VS

The difference between both updates is almost night and day even though the actual status text I shared is the same. The way Twitter has approached location is to treat it as a bunch of “poorly translated” GPS coordinates that are bolted on to the end of my status update. The Facebook update not only gives you that but also a human readable location for where I am down to the room number and includes some social context such as the fact that I was attending the talk with two coworkers from Windows Live.

As real-time location data starts to permeate social experiences, there’s a lot to learn from the above screenshots. In the example above, people who are interested in the topic based on my status knew which room to find danah’s talk from the Facebook update whereas they were told “downtown austin” in the Twitter update. As designers of social software applications, we should be mindful that location data enhances the experience and the information being shared. Adding location simply for buzzword compliance or to add metadata to the status update without enhancing the experience actually ends up crufting it up.

* Twitter’s value proposition is that it is the place to interact with celebrities and microcelebrities that you care about. It is useful to note that the much maligned Suggested Users List was key in establishing this value proposition in the minds of users. This is different from Facebook’s position as the social network for your real world friends, family, coworkers and acquaintances.

Note Now Playing: B.O.B. - Nothin' On You (featuring Bruno Mars) Note

Categories: Social Software | Startup Shoutout

March 10, 2010

@ 03:06 PM

Comments [2]

Building Scalable Databases: Are Relational Databases Compatible with Large Scale Websites?

A few weeks ago Todd Hoff over on the High Scalability blog penned a blog post titled MySQL and Memcached: End of an Era? where he wrote

If you look at the early days of this blog, when web scalability was still in its heady bloom of youth, many of the articles had to do with leveraging MySQL and memcached. Exciting times. Shard MySQL to handle high write loads, cache objects in memcached to handle high read loads, and then write a lot of glue code to make it all work together. That was state of the art, that was how it was done. The architecture of many major sites still follow this pattern today, largely because with enough elbow grease, it works.
…
With a little perspective, it's clear the MySQL+memcached era is passing.
…
LinkedIn has moved on with their Project Voldemort. Amazon went there a while ago.

Digg declared their entrance into a new era in a post on their blog titled Looking to the future with Cassandra,
…
Twitter has also declared their move in the article Cassandra @ Twitter: An Interview with Ryan King.

Todd’s blog has been a useful source of information on the topic of scaling large scale websites since he catalogues as many presentations as he can find from industry leaders on how they’ve designed their systems to deal with millions to hundreds of millions of users pounding their services a day. What he’s written above is really an observation about industry trends and isn’t really meant to attack any technology. I did find it interesting that many took it as an attack on memcached and/or relational databases and came out swinging.

One post which I thought tried to take a balanced approach to rebuttal was Dennis Forbes’ Getting Real about NoSQL and the SQL-Isn't-Scalable Lie where he writes

I work in the financial industry. RDBMS’ and the Structured Query Language (SQL) can be found at the nucleus of most of our solutions. The same was true when I worked in the insurance, telecommunication, and power generation industries. So it piqued my interest when a peer recently forwarded an article titled “The end of SQL and relational databases”, adding the subject line “We’re living in the past”. [Though as Michael Stonebraker points out, SQL the query language actually has remarkably little to actually to do with the debate. It would be more clearly called NoACID]
…
From a vertical scaling perspective — it’s the easiest and often the most computationally effective way to scale (albeit being very inefficient from a cost perspective) — you have the capacity to deploy your solution on powerful systems with armies of powerful cores, hundreds of GBs of memory, operating against SAN arrays with ranks and ranks of SSDs.

The computational and I/O capacity possible on a single “machine” are positively enormous. The storage system, which is the biggest limiting factor on most database platforms, is ridiculously scalable, especially in the bold new world of SSDs (or flash cards like the FusionIO).
…
From a horizontal scaling perspective you can partition the data across many machines, ideally configuring each machine in a failover cluster so you have complete redundancy and availability. With Oracle RAC and Sybase ASE you can even add the classic clustering approach. Such a solution — even on a stodgy old RDBMS — is scalable far beyond any real world need because you’ve built a system for a large corporation, deployed in your own datacenter, with few constraints beyond the limits of technology and the platform.

Your solution will cost hundreds of thousands of dollars (if not millions) to deploy, but that isn’t a critical blocking point for most enterprises.This sort of scaling that is at the heart of virtually every bank, trading system, energy platform, retailing system, and so on.

To claim that SQL systems don’t scale, in defiance of such obvious and overwhelming evidence, defies all reason.

There’s lots of good for food for thought in both blog posts. Todd is right that a few large scale websites are moving beyond the horizontal scaling approach that Dennis brought up in his rebuttal based on their experiences. What tends to happen once you’ve built a partitioned/sharded SQL database architecture is that you tend to notice that you’ve given up most of the features of an ACID relational database. You give up the advantages of the relationships by eschewing foreign keys, triggers and joins since these are prohibitively expensive to run across multiple databases. Denormalizing the data means that you give up on Atomicity, Consistency and Isolation when updating or retrieving results. And the end all you have left is that your data is Durable (i.e. it is persistently stored) which isn’t much better than you get from a dumb file system. Well, actually you also get to use SQL as your programming model which is nicer than performing direct file I/O operations.

It is unsurprising that after being at this point for years, some people in our industry have wondered whether it doesn’t make more sense to use data stores that are optimized for the usage patterns of large scale websites instead of gloriously misusing relational databases. A good example of the tradeoffs is the blog post from the Digg team on why they switched to Cassandra. The database was already sharded which made performing joins to calculate results of queries such as “which of my friends Dugg this item?” to be infeasible. So instead they had to perform two reads from SQL (all Diggs on an item and all of the user’s friends) then perform the intersection operation on the PHP front end code. If the item was not already cached, this leads to disk I/O which could take seconds. To make the situation worse, you actually want to perform this operation multiple times on a single page view since it is reasonable to expect multiple Digg buttons on a page if it has multiple stories on it.

An alternate approach is to denormalize the data and for each user store a list of stories that have been Dugg by at least one of their friends. So whenever I Digg an item, an entry is placed in each of my friends’ lists to indicate that story is now one that has been Dugg by a friend. That way when the a friend of mine shows up, it is a simple lookup to say “is this story ID on the list of stories Dugg by one of their friends?” The challenge here is that it means Digging an item can result in literally thousands of logical write operations. It has been traditionally prohibitively expensive to incur such massive amounts of write I/O in relational databases with all of their transactionality and enforcing of ACID constraints. NoSQL databases like Cassandra which assume your data is denormalized are actually optimized for write I/O heavy operations given the necessity of having to perform enormous amounts of writes to keep data consistent.

Digg’s usage of Cassandra actually serves as a rebuttal to Dennis Forbes’ article since they couldn’t feasibly get what they want with either horizontal or vertical scaling of their relational database-based solution. I would argue that introducing memcached into the mix would have addressed disk I/O concerns because all records of who has Dugg an item could be stored in-memory so comparisons of which of my friends have Dugg an item never have to go to disk to answer any parts of the query. The only caveat with that approach is that RAM is more expensive than disk so you’ll need a lot more servers to store 3 terabytes of data in memory than you would on disk.

However, the programming model is not the only factor one most consider when deciding whether to stay with a sharded/partitioned relational database versus going with a NoSQL solution. The other factor to consider is the actual management of the database servers. The sorts of questions one has to ask when choosing a database solution are listed in the interview with Ryan King of Twitter where he lists the following checklist that they evaluated before deciding to go with Cassandra over MySQL

We first evaluated them on their architectures by asking many questions along the lines of:

How will we add new machines?

Are their any single points of failure?

Do the writes scale as well?

How much administration will the system require?

If its open source, is there a healthy community?

How much time and effort would we have to expend to deploy and integrate it?

Does it use technology which we know we can work with?

The problem with database sharding is that it isn’t really a supported out of the box configuration for your traditional relational database product especially the open source ones. How your system deals with new machines being added to the cluster or handles machine failure often requires special case code being written by application developers along with special hand holding by operations teams. Dealing with issues related to database replication (whether it is multi-master or single master) also often takes up unexpected amounts of manpower once sharding is involved.

For these reasons I expect we’ll see more large scale websites decide that instead of treating a SQL database as a denormalized key-value pair store that they would rather use a NoSQL database. However I also suspect that a lot of services who already have a sharded relational database + in-memory cache solution can get a lot of mileage from more judicious usage of in-memory caches before switching. This is especially true given that you still caches in front of your NoSQL databases anyway. There’s also the question of whether traditional relational database vendors will add features to address the shortcomings highlighted by the NoSQL movement? Given that the sort of companies adopting NoSQL are doing so because they want to save costs on software, hardware and operations I somehow doubt that there is a lucrative market here for database vendors versus adding more features that the banks, insurance companies and telcos of the world find interesting.

Note Now Playing: Birdman - Money To Blow (featuring Drake & Lil Wayne Note

Categories: Web Development

February 28, 2010

@ 05:19 PM

Comments [0]

Achievements, Game Mechanics and Social Software

Earlier this morning I saw the following tweet by Alex Payne, one of the developers who works on Twitter

Game mechanics aren't going to fix your product and they aren't making people's lives better. Great essay: http://j.mp/aN66i8

Alex’s description piqued my interest so I checked out the article by Peter Michaud titled Achievement Porn and not only agreed that it is a great essay but walked away from it with a fairly different conclusion from Alex. Below are two key excerpts from the article

The game article, and the meta discussion surrounding it is actually part of an even larger discussion that affects more than just video gamers. Games are just a minor symptom of a systematic disease:

Our society is set up to make us feel as though we must always achieve and grow. That’s true because individuals growing tend to bolster the power and creature comforts of the groups they belong to with inventions, innovations, and impressive grandstanding (Go Team!).

Because of this pressure to grow, there’s another incentive to make growth easier. More perversely, to make growth seem easier.

Why work hard for achievements, when you could relax and achieve the same? That’s not pathological, that’s how exponential progress works.

But why achieve at all when you can plug into any number of “achievement games” and get the same personal satisfaction?
…
The good news is that these little “achievement games” are fairly easy to recognize once you realize what’s going on. The bad news is that more are cropping up at an alarming rate, sped largely by the intertubes.

Games fast becoming standard are the “followers” and “friends” games for example. Twitter, FaceBook, LinkedIn, et al, all have their own ostensible raison d’etre, but the psychological underpinning they all share is this treadmill of achievement. This accumulation of points that’s correlated with whatever the intended benefit of the service is.

I find this discussion interesting because it matches the theme of my most recent posts the difference between adding features that are good for users versus good for the product. The physiological underpinnings that make achievement games work have been covered quite well in the Slate article Seeking: How the brain hard-wires us to love Google, Twitter, and texting. And why that's dangerous. The article argues that our brains are wired to derive more pleasure from chasing after something than actually getting it. However although we are hard wired to constantly chase after achievement it is our individual choice which achievements spur us. Thus it is the same underlying biology that explains the addictions of Tiger Woods and those of the World of Warcraft junkie.

Our lives are full of lots of little “achievement treadmills” it’s just that video games are the most obvious. A few months ago I started playing Call of Duty: Modern Warfare 2. I play it for about 1-2 hours every day and according to the game have clocked in almost two weeks of playing time. The game has lots of mini-achievements and ways to keep you grinding from unlocking titles when you complete challenges like score 500 headshots with a particular weapon to encouraging you to de-power your character after you hit level 70 and start the grind all over again (I’m on my 3rd or 4th circuit). The interesting question is what have I lost by spending all this time playing MW2?

It turns out that the two activities that have suffered the most are my blogging and writing code for RSS Bandit. An insight from Peter Michaud’s post is that these were also achievement treadmills in their own way. On my blog all of my posts literally have a score which is the number of times other people have retweeted links to them, bookmarked them on delicious or shared them on Facebook. I also use FeedBurner and for a while used to obsess about my number of subscribers but eventually got over it since I don’t have the time or willingness to create the kind of content that generates a large following. As for RSS Bandit, the number of people who use it and the number of bugs I fixed have always been motivating factors. I can still remember the feeling I’d get when I’d see stats like 100,000 downloads a month or when I realized the application had been downloaded over a million times since it had started. Since I consider the glory days of Outlook-inspired desktop RSS readers to be in the past, I’m not as motivated as I once was to work on the project.

What it really boils down to is that I traded one set of “achievement treadmills” (i.e. blogging and contributing to an Open Source project) for another more explicit set (i.e. playing Modern Warfare 2). Now we can go back to Alex Payne’s tweet and find out where I disagree. From the perspective of Infinity Ward (creators of MW2) is it a bad thing for their business that they’ve created a game that has sucked me into almost 300 hours of play time? On the other hand, is it a good thing for me as a fully functioning member of society to have cut down my contributions to an Open Source project and the blogosphere to play a video game? Finally, is it better for me as a person to have traded achievement treadmills where I have little control over the achievements (i.e. number of blog subscribers, number of people who download a desktop RSS reader, etc) for one where I have complete control of the achievements as long as I dedicate the time?

I’ll leave the answers to those questions up to the reader. I will say game mechanics can more than “fix” a social software product, they can make it a massive success that it’s users are obsessed with. Just look at Farmville or FourSquare for explicit examples or sites like Twitter which have inspired hundreds of guides to increasing your number of Twitter followers for a more subtle example. Does it mean that these products aren’t making their users lives “better”? Well, it depends on how you define better.

Note Now Playing: DJ Khalid - All I Do Is Win (featuring Ludacris, Rick Ross, T-Pain & Snoop Doggy Doog) Note

Categories: Social Software

February 15, 2010

@ 04:22 PM

Comments [2]

Understanding the Real-Time Web for Web Developers

The term “The real-time web” has become popular as a way to describe burgeoning trends and technologies related to consuming web content as soon as it is created. However like popular buzz phrases such as “services oriented architecture” and “web 2.0” which came before it, there is often difficulty in understanding where the technical details end and where the hype begins. Given that this trend is a fundamental shift in how many users interact with the web, it is a good idea for developers to have a clear idea of the key concepts and implementations options available to them as they bring their applications into real-time web.

What Features and Functionality Make Up the Real-Time Web?

When people talk about the real-time web technologies, they are usually talking about one or more of the following features

Refreshing a web page as new updates are available without reloading the page. A good example of this is seen when performing a search on Twitter and you’ll notice that a yellow bar with a constantly updated number of tweets since you started searching is displayed.
Receiving notifications on content updates as soon as they happen instead of polling. This is primarily about moving away from RSS’s model of polling every couple of minutes or hours and instead having an end point that gets messages delivered as soon as they happen. An example of this is the fact that user status updates from Twitter appear within a second on FriendFeed when the user has hooked up both services. This is in contrast to how long it takes blog posts to show up in the typical RSS reader from when they are published.
Some people consider the universe of status updates on sites like Facebook and Twitter to be the real-time web. For these people, the key interesting technology in this space is the ability to consume a neverending feed of content from these sites (aka a fire hose) and provide search functionality over this data.

We can now take a look at some of the underlying technologies that make some of these user experiences and scenarios possible.

Bringing Real-Time to AJAX: COMET, Long Polling and soon Web Sockets

Most web developers should be familiar with the concept of Asynchronous Javascript and XML (AJAX) which enables the creation of dynamic webpages that can be partially updated without having to reload the site. A traditional AJAX interaction involves the user interacting with part of the page and then the browser submitting the request to the server and then rebuilding that part of the page with the results from the request. However it turns out that there are many situations where an application may want to update parts of a page without waiting for user interaction such as displaying live stock tickers, instant messaging scenarios or showing feedback on an article as comments are posted. The set of approaches to solving this term are typically described using the name COMET.

COMET typically refers to keeping a permanent open connection between a browser and a server using a number of techniques. One approach is the hidden iframe technique. With this technique you create an inline frame (i.e. an iframe) that is hidden from the user and then have the frame slowly filled with content as events occur on the server. This takes advantage of the fact that a browser will keep an open connection to the server as long as a page has not fully loaded. There’s a great example of what generating one of these invisible iframes looks like on the server side in the article How to implement COMET with PHP

  <?php
 
  header("Cache-Control: no-cache, must-revalidate");
  header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
  flush();

 
  ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml">
  <head>
 
    <title>Comet php backend</title>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  </head>
  <body>
 
  <script type="text/javascript">

    // KHTML browser don't share javascripts between iframes
    var is_khtml = navigator.appName.match("Konqueror") || navigator.appVersion.match("KHTML");
    if (is_khtml)
    {
      var prototypejs = document.createElement('script');
      prototypejs.setAttribute('type','text/javascript');
      prototypejs.setAttribute('src','prototype.js');
      var head = document.getElementsByTagName('head');
      head[0].appendChild(prototypejs);
    }
    // load the comet object
    var comet = window.parent.comet;
 
  </script>
 
  <?php
 
  while(1) {

    echo '<script type="text/javascript">';
    echo 'comet.printServerTime('.time().');';
    echo '</script>';
    flush(); // used to send the echoed data to the client

    sleep(1); // a little break to unload the server CPU
  }
 
  ?>
 
  </body>
  </html>

As you can see from this example, the page will never finish rendering which means the browser always has an open connection to the server. It should also be noted that the payload of each new event is the Javascript function that you want the browser to execute to update the relevant part of the page. This technique works in most common browsers since it uses established technologies like iframes and Javascript. The main problems are that since it is somewhat of a hack, it is somewhat opaque to for web applications to determine the current state of the communication between the browser and the server (e.g. error handling).

Another common technique is long polling. With this approach the browser application makes an asynchronous request for data from the server either using XMLHttpRequest or a script tag. Once data is returned, another request of the same type is made. The essentially permanently keeps an open connection between the browser and the server. This approach is often favored by developers because it reuses common AJAX techniques and doesn’t require any special client-side techniques. The main challenge with long polling and other COMET techniques is choosing a server-side framework that can solve the C10K problem. Specifically, traditional web servers are designed to handle short-lived connections between browsers and the server due to the request/response nature of HTTP. With COMET, we can have thousands to tens of thousands of browsers keeping an open connection to a server. To address this problem there are now a number of dedicated COMET application servers with some of the more notable implementations ones being FriendFeed's Tornado and Jetty.

The W3C’s HTML 5 working group is working on making COMET part of the next generation of HTML with the creation of the WebSockets specification. This formalizes the notion of an XMLHttpRequest-style object that can create a permanent bi-directional connection to the server without the hacks of long polling (i.e. re-establishing a connection whenever data is sent) and is also resistant to some of the issues long polling faces when going through HTTP intermediaries such as firewalls and proxy servers.

Notifications at the Speed of Light: Go Beyond Polling with PubSubHubbub

Content syndication using XML syndication technologies such as RSS and Atom is a key part of how many websites and web users consume content today. However XML syndication is traditionally not a real-time experience because feed readers work by polling a feed at specific intervals which could be anything from minutes to hours apart. Since polling is an inefficient way to get content updates it isn’t feasible to get updates within seconds from when they happen without overloading the service that is being polled. This calls for new communications patterns where clients are notified of changes as they happen instead of having to poll for them with the time lag that this entails.

To solve this problem, a couple of Google employees have proposed the PubSubHubbub protocol (commonly abbreviated as PuSH) as a way to bring real-time notifications to content syndication on the Web. The workflow for a PuSH system is as follows

A feed producer declares its Hub server(s) in its Atom or RSS XML file, via <link rel="hub" ...>. The hub(s) can be run by the publisher of the feed, or can be a community hub that anybody can use.
A feed reader or subscriber initially polls the feed as usual. .
On seeing that the feed supports PuSH, the subscriber declares it’s interest in getting real-time notifications from the hub when the feed is updated. This is done by registering a callback URL so it can receive the newly updated content whenever the feed is updated.
When the feed producer next updates the feed, the publisher also pings the Hub(s) indicating there is an update.
The hub then retrieves the feed and publishes the changes to all interested subscribers by POSTing an Atom or RSS item to their specified callback URLs

PubSubHubbub provides a more complete solution that other approaches such as Twitter’s firehose. With explicit subscription, services can use PuSH to get notified about updates that are either private or require authentication. With a firehose approach, all public content generated on the site is shared with all subscribers while private content is not provided since it isn’t really feasible to cherry pick which authenticated content should go to which consumer of the firehose.

There are already a number of sites consuming and producing PubSubHubbub including MySpace, LiveJournal, Google Reader, Tumblr and FriendFeed.

Creating and Consuming Fire Hoses: The Lynchpin of Real-Time Search

To many social media observers the real-time web refers to the ecosystem that has sprung up around status updates on sites like Twitter and Facebook. Being able to analyze status updates as they occur to determine people’s sentiments on a news event as it occurs or detect breaking news is a rapidly growing space with lots of players including Microsoft’s Bing, Tweetmeme and Sysomos among others. These services typically work by consuming a never ending stream of updates from the target services and then processing the updates as they are received. Twitter calls this never ending stream of updates a “firehose” and I’ll use this term to describe similar offerings from other services.

A firehose works similar to the COMET servers described earlier. A client connects to a server via HTTP and then starts to receive a stream of updates as they occur in some structured format. An early version of such a firehose is the SixApart Update Stream which is used to provide real-time feeds on changes to TypePad, Vox and [formerly] LiveJournal blogs to interested parties. From the SixApart developer documentation

Connecting to the Stream

To connect to the stream a simple HTTP GET request is issued to the following endpoint:
http://updates.sixapart.com/atom-stream.xml

Once a connection is established, the Atom Server will then begin transmitting to the client any content that is injected into the stream. Additionally, the Atom Stream Server transmits timestamps every second both to keep the connection alive (in case it goes idle), and to provide you a marker so you know how far you've gotten so you can reconnect at a certain point in time if you restart your listener.

Example Stream
		GET /atom-stream.xml HTTP/1.0

		Host: updates.sixapart.com


		HTTP/1.0 200 OK

		Content-Type: text/xml

		Connection: close


		<?xml version="1.0"?>

		<time>1834721912342</time>

		<time>1834721912372</time>

		<time>1834721912402</time>

		<feed>
  ...

		</feed>

		<feed>
   ...

		</feed>

		<sorryTooSlow youMissed="5" />		

This is very similar to the Twitter Streaming API with the primary differences being supported data formats (Twitter supports both JSON and XML output), support for filter predicates such as being limited to a firehose of posts containing references to “superbowl” and the fact that Twitter also provides notifications of deleted status updates in the stream.

Applications that consume such streams need to be carefully coded to handle falling behind and being able to restart from where they stopped if disconnected for any reason.

Learning More

If you find this topic interesting, I’ll be speaking about the real-time Web at two industry conferences next month with experts from noted Web companies.

MIX 10: Building Platforms and Applications for the Real-Time Web
From news feeds to search, the Web has become all about real-time access to news and other information as it happens. This panel will discuss what it takes to build the platforms and user experiences that power some of the most notable services on the real-time web. Come hear a lively discussion about the real-time web with moderator Dare Obasanjo (Microsoft) and panelists Ari Steinberg (Facebook), Brett Slatkin (Google), Chris Saad (~~JS-Kit~~ Echo), Lili Cheng (Microsoft) and Ryan Sarver (Twitter).

SXSW: Can the Real-Time Web Be Realized?
The emergence of the real-time web enables an unprecedented level of user engagement and dynamic content online. However, the rapidly growing audience puts new, complex demands on the architecture of the web as we know it. This panel will discuss what is needed to make the real-time web achievable. Organizer: Brett Slatkin.

Note Now Playing: Young Money - Bedrock (featuring Lloyd) Note

Categories:

February 15, 2010

@ 02:59 PM

Comments [4]

Google Buzz vs. Google Wave

From the Google Wave Federation architecture white paper

Google Wave is a new communication and collaboration platform based on hosted documents (called waves) supporting concurrent modifications and low-latency updates. This platform enables people to communicate and work together in new, convenient and effective ways. We will offer these benefits to users of wave.google.com and we also want to share them with everyone else by making waves an open platform that everybody can share. We welcome others to run wave servers and become wave providers, for themselves or as services for their users, and to "federate" waves, that is, to share waves with each other and with wave.google.com. In this way users from different wave providers can communicate and collaborate using shared waves. We are introducing the Google Wave Federation Protocol for federating waves between wave providers on the Internet.

From a Buzz post by Dewitt Clinton, a Google employee

The best way to get a sense of where the Buzz API is heading is to take a look at http://code.google.com/apis/buzz/. You'll notice that the "coming soon" section mentions a ton of protocols—Activity Streams, Atom, AtomPub, MediaRSS, WebFinger, PubSubHubbub, Salmon, OAuth, XFN, etc.

What it doesn't talk much about is Google. That's because the goal isn't Google specific at all. The idea is that someday, any host on the web should be able to implement these open protocols and send messages back and forth in real time with users from any network, without any one company in the middle. The web contains the social graph, the protocols are standard web protocols, the messages can contain whatever crazy stuff people think to put in them. Google Buzz will be just another node (a very good node, I hope) among many peers. Users of any two systems should be able to send updates back and forth, federate comments, share photos, send @replies, etc., without needing Google in the middle and without using a Google-specific protocol or format.

From Mark Sigal’s post Google Buzz: Is it Project, Product or Platform?

I think that it's great that Google is iterating Gmail (read Tim O'Reilly's excellent write-up on it here), and actually improving an existing product, versus rolling out a knock-off of something that is already in the market.

Nonetheless. I am confused. I thought that Google Wave was destined to be the new Gmail, but after yesterday's announcement, I am left wondering if Gmail is, instead, the new Google Wave.

Since the saying goes that people in glass houses shouldn’t throw stones, I won’t make any comment besides sharing these links with you.

Note Now Playing: 50 Cent - Crime Wave Note

Categories: Competitors/Web Companies | Mindless Link Propagation

February 13, 2010

@ 03:51 PM

Comments [0]

Autofollowing on Social Networks and User Privacy Becoming a Pawn in a Competitive Chess Games

NOTE: For an official Microsoft statement on Google Buzz, go here. This post is a discussion of recent trends in social networking features in our industry and how they impact web users focusing on a feature of Google Buzz as a kick off point.

One of the much lauded features of the recently released Google Buzz is autofollowing which is described as “No setup needed: Automatically follow the people you email and chat with a lot”. This feature solves the what if you build it and they don’t come problem that Google Buzz faced. What if when presented with a bunch of FriendFeed-like features in Gmail, people decided that they don’t want to build another social network when they’ve already done so on places like Facebook, MySpace and Twitter? Auto-following ensured that Gmail users already had a populated network of people they were receiving status updates from once Google Buzz was launched. So from the perspective of Google, it’s a great feature.

But is the feature in the best interests of users? Ignoring some of the privacy issues of the people you email with becoming a public friends list there is still the question of whether the feature is good for users in isolation. Here’s a story; my wife is divorced and has kids from her previous marriage. This means she exchanges a lot of email with her ex-husband and his new wife around kid visiting schedules, vacations, etc. Do you think my wife would consider it a great feature if one day she started getting status updates on how her ex-husband and his new wife spend their days due to introduction of social networking features in her email client?

Those of us building social networking products have a responsibility not only to ask if a feature is good for our product but also whether it is good for our users as well. Sometimes these goals align and sometimes they do not. What we do when they don’t is what defines us as an industry.

I want to also call out some of the thought leadership on this topic that has come from Marshall Kirkpatrick over on ReadWriteWeb with posts such as Why Facebook is Wrong: Privacy Is Still Important where he discusses Facebook’s privacy changes from last year. Personally, I think Facebook cleaned up their privacy model because they used to have privacy setting based on regional networks where user data was visible to people in a geographic region (e.g. everyone in New York city or everyone in Australia can see my profile information) which is actually kind of dumb. There have been legitimate privacy issues related to such loose settings such as Rudy Giulani's daughter being a Barrack Obama supporter being visible to everyone from New York city on Facebook. With the change people with such settings were asked if they wanted their profiles to be public since they effectively were in the old model. The question Marshall Kirkpatrick brings up is whether it is better for Facebook users in such situations to be asked do you want to go from everyone in New York can see my data –> public or only visible to my friends and networks? It is clear which is better for Facebook as a service but not so clear what is better for their users with regards to their personal notions of privacy and mental well being.

Social networking has transformed the way people communicate and relate to each other in many tangible ways. However they are built on real human relationships and connections. I hate the thought that people’s relationships and communications are becoming the ammunition in a war between web companies to dominate a particular online space. We can be better than that. We must be better than that.

Note Now Playing: Bun B - You're Everything (featuring Rick Ross, David Banner, Eightball & MJG) Note

Categories:

February 9, 2010

@ 01:25 PM

Comments [13]

The iPhone Obsession and Lying with Statistics

PPK over at the QuirksMode blog recently wrote a rant titled The iPhone Obsession where he berates developers for focusing on the building mobile sites that are targeted towards well on the iPhone. To make his point, he uses the following statistics

Stats

Let’s illustrate that last remark with some smartphone sales stats:

Nokia: 39%

RIM: 20% (BlackBerry)

Apple: 15% (this 15% is obviously far more important than the previous 59%)

HTC: 5%

Other: 21% (Samsung is expected to make a major jump this year)

…

Source: Morgan Stanley Mobile Internet Report (48Meg PDF) p. 160

…

And here are the smartphone OS stats, also from Tomi Ahonen (whose blog I highly recommend, by the way):

Symbian: 45% (all of Nokia plus a bit of SonyEricsson and Samsung)

BlackBerry: 20%

iPhone: 15% (this 15% is obviously far more important than the previous 65%)

Windows Mobile: 6% (HTC, Samsung, SonyEricsson)

Android: 4% (HTC, Samsung, SonyEricsson, Motorola, Google)

Other: 10% (Various Linux builds, Palm, as well as really obscure stuff. Will be reinforced by Samsung Bada during this year.)

Despite the platform having only 15% sales market share we all want our mobile websites to look exactly like an iPhone app and we only want to use iPhone features.

Although these statistics seem persuasive they are actually totally useless when it comes to arguing the point of which browsers mobile developers should target. Ownership of a mobile phone doesn’t directly equate to using it for browsing the web. The important metric is the smartphone OS breakdown among people who actually use the mobile web on their phones.

You can get these stats easily from AdMob's mobile metrics report which is based on measuring ad impressions across various mobile sites across various smartphone OSes. These metrics paint a very different picture from the sales data as shown below

According to these stats, the iPhone OS is actually the major source of traffic for the mobile web in most continents except for Africa and Asia. What this tells you is that developers aren’t being stupid when they try to ensure their sites work well on the iPhone.

That said, I agree that it is a bad idea for developers to specifically target features of a particular browser versus using web standards. However that is different from making sure your site works well in the most popular platform used for browsing mobile websites in your particular market.

Categories:

January 18, 2010

@ 02:32 PM

Comments [5]

Does the world need OpenID Connect?

About two weeks ago Chris Messina wrote a post titled OpenID Connect where he argued for the existence of a Facebook Connect style technology build on OpenID. He describes the technology as follows

So, to summarize:

for the non-tech, uninitiated audiences: OpenID Connect is a technology that lets you use an account that you already have to sign up, sign in, and bring your profile, contacts, data, and activities with you to any compatible site on the web.

for techies: OpenID Connect is OpenID rewritten on top of OAuth WRAP using service discovery to advertise Portable Contacts, Activity Streams, and any other well known API endpoints, and a means to automatically bootstrap consumer registration and token issuance.

This is something I brought up over a year ago in my post Some Thoughts on OpenID vs. Facebook Connect. The fact is that OpenID by itself is simply not as useful as Facebook Connect. The former allows me to sign-in to participating sites with my existing credentials while the latter lets me sign-in, share content with my social network, personalize and find my friends on participating sites using my Facebook identity.

As I mentioned in my previous post there are many pieces of different “Open brand” technologies that can be pieced together to create something similar to Facebook Connect such as OpenID + OpenID Attribute Exchange + Portable Contacts + OAuth WRAP + Activity Streams. However no one has put together a coherent package that ties all of these together as a complete end-to-end solution. This isn’t helped by the fact that these specs are at varying levels of maturity and completion.

One of the reasons this hasn’t happened is for a reason I failed to anticipate. Back in late 2008, I assumed we would see lots of competitors to Facebook Connect. This hasn’t truly materialized. Google Friend Connect has turned out to be an interesting combination of OpenID sign-in and the ability to add “social” widgets to your site but not about integrating with Google’s social networking services in a deep way (probably because Google doesn’t have any?). MySpaceID has failed to gain traction and lacks key features of Facebook Connect such as being able to publish rich activities from a 3rd party site to MySpace. And that’s it. Those two technologies are the closest to Facebook Connect from a major online player and they fall far short.

So do we need an OpenID Connect? We would if there were lots of Facebook Connect style offerings that significantly differed in implementation. However there aren’t. One could argue that perhaps the reason we don’t have many is that there are no standards that guide websites on what to implement. However this sounds like using “standards” for inventing technologies instead of standardizing best practice. I’ve always considered this questionable from my days working with XML technologies XML Schema, SOAP and WSDL.

If you got together to create an OpenID Connect now, the best you could do is come up with a knock off of Facebook Connect using “Open brand” technologies since that’s the only example we have to work off of. That’s great until Facebook Connect adds more features or websites finally wrap their heads around this problem space and actually start competing with Facebook Connect. Premature standardization hinders instead of helps.

Although we might need OpenID Connect someday, that day isn’t today.

Note Now Playing: Ke$ha - TiK ToK Note

Categories: Social Software | Web Development

January 11, 2010

@ 03:16 PM

Comments [8]

Some thoughts on Facebook's change of stance on user privacy

Marshall Kirkpatrick has a post entitled Facebook's Zuckerberg Says The Age of Privacy is Over where he reviews some quotes by Mark Zuckerburg, the founder of Facebook, on their recent privacy changes and how these changes are reflecting evolving social norms. Below is an excerpt on Marshall's take on Mark Zuckerburg's comments

Facebook allows everyday people to share the minutiae of their daily lives with trusted friends and family, to easily distribute photos and videos - if you use it regularly you know how it has made a very real impact on families and social groups that used to communicate very infrequently. Accessible social networking technology changes communication between people in a way similar to if not as intensely as the introduction of the telephone and the printing press. It changes the fabric of peoples' lives together. 350 million people signed up for Facebook under the belief their information could be shared just between trusted friends. Now the company says that's old news, that people are changing. I don't believe it.

I think Facebook is just saying that because that's what it wants to be true.

There's lots of food for thought here. At first I wondered whether Facebook would have become the global phenomenon that is today where your friends, neighbors, coworkers and old school chums are sharing the minutiae of their lives with you if it had been public by default. Then I realized that sort of thinking doesn't matter since Facebook has 350 million users today so wondering how things could have turned out years ago with a different design isn't particularly interesting.

What is interesting is considering why Facebook would want it to be true that many of their users think nothing of making their Facebook data public versus sharing it within their social network? The simple answer is Twitter.

Below is the Google Trends chart showing the difference in traffic between both sites.

In looking at the above chart, one might think it ludicrous that Facebook would have anything to fear from Twitter given that it has at least an order of magnitude more users. However compare the above chart to a comparison of news references and search queries for the phrases "search twitter" versus "search Facebook".

There are two things you learn from the above chart. The first is that the news media is a lot more interested in talking about search and Twitter instead of search and Facebook. This implies that even though Facebook has similar features to Twitter and ten times the user base, people don't talk about the power of being able to search Facebook status updates like they do about Twitter. The second is that there actually more interest from people actually doing search queries in searching content on Facebook than in searching Twitter content which is unsurprising since Facebook has a lot more users.

However the fact that status updates and other content on Facebook is private by default means Facebook cannot participate in this space even though it has the same kind of content that Twitter does but it is more valuable because they have lots more content and it is backed by real identities not anonymous users. Here's a quick list of the top of my head of the kinds of apps you can enable over Twitter's public stream of status updates that Facebook was locked out of until their privacy change

What The Trend – Lists topics that are currently trending on Twitter and why. Often a quick way to find breaking news before it is reported by the mainstream media.
Tweetmeme – The top links that are currently being shared on Twitter. Another source of breaking news and cool content. It's like Digg and Reddit but without having to vote on content on some geeky "social news" site.
Bitly.TV – A place to watch the videos that are currently being shared on Twitter.
Twittervision – A cool way to idle away the minutes by seeing what people all over the world are saying on Twirter.
Google Real-Time search – See what Twitter users are saying about a particular search term in real-time as part of your search results
Filtrbox – A tool that enables companies to see what their customers are saying about their products and brands on Twitter

All of these and more are the kinds of scenario Facebook could enable if their status update streams are public instead of private. People think Twitter is worth $1 billion because it is sitting on this well of real-time status updates and has created this ecosystem of services that live of its stream. However Facebook is sitting on ten times as much data yet could not be a part of this world because of their history of being a privacy centered social network. Being able to participate in the real-time search increases Facebook value and broadens its reach across the Web. With the privacy changes in place this will now be the case. Especially since 50 percent of their users have accepted the more public default privacy settings. Facebook can now participate in the same real-time ecosystem as Twitter and will bring more content that is easier to trust since it comes from people's real identities.

That said, I commend the people at Facebook for having the courage to evolve their product in the face of new market opportunities instead of being tied to their past. Lots of companies let themselves be ruled by fear and thus stick to the status quo for fear of ticking off their users which often leads to bored users. Kudos.

Note Now Playing: Flobots - Handlebars Note

Categories: Social Software

January 4, 2010

@ 03:06 PM

Comments [10]

Brizzly, Seesmic Web and the Future of RSS

Recently I came across two blogs I thought were interesting and would love to follow regularly; Chris Dixon's blog and the Inside Windows Live blog. What surprised me was that my first instinct was to see if they were on Twitter instead of adding their RSS feeds to my favorite RSS reader. I thought this was interesting and decided to analyze my internal thought process that led me to preferring following blogs via Twitter instead of consuming the RSS feeds in Google Reader + RSS Bandit.

I realized it comes down to two things, one I’ve mentioned before and the second which dawned on me recently

The first problem is that the user experience around consuming feeds in traditional RSS readers which take their design cues from email readers is all sorts of wrong. I’ve written about this previously in my post The Top 5 Reasons RSS Readers Went Wrong. Treating every blog post as important enough that I have view the entire content and explicitly mark it as read is wrong. Not providing a consistent mechanism to give the author feedback or easily reshare the content is archaic in today’s world. And so on.
The mobile experience for consuming Twitter streams is all sorts of awesome. I currently use Echofon to consume Twitter on my phone and have used Twitterific which is also excellent. I’ve also heard people say lots of good things about Tweetie. On the other hand, I haven’t found a great mobile application for consuming RSS feeds on my mobile phone which may be a consequence of #1 above.

So I’ve been thinking about how to make my RSS experience more like my Twitter experience given that not all the blogs I read are on Twitter or will ever be on the service. At first I flirted with building a tool that automatically creates a Twitter account for a given RSS feed but backed away from that when I remembered that the Twitter team hates people using it as a platform for rebroadcasting RSS feeds.

I realized that what I really need is a Twitter applicationthat also understands RSS feeds and shows them in the same stream. In addition, I may have been fine with this being a new app on the Web but don’t want to lose the existing Twitter clients on my mobile phone. So I really want a web app that shows me a merged Twitter/RSS streams and that exposes the Twitter API so I can point apps like Echofon/Twitterific/Tweetie at it.

As I thought about which web app could be closest to doing this today I landed on Brizzly and Seesmic Web. These sites are currently slightly different web interfaces to to the Twitter service which [at least to me] currently haven’t provided enough value above and beyond the Twitter website for me to use on a regular basis. Being able to consume both my RSS feeds and my Twitter stream on such services would not only serve as a differentiator between them and other Twitter web clients but would also be functionality that Twitter wouldn’t be able to make obsolete given their stated dislike of RSS content on their service.

I’d write something myself except that I doubt that the authors of Twitter mobile apps will be interested in making it easy to consume a Twitter stream from sites other than http://www.twitter.com unless lots of their users ask for this feature which will only happen if services like Brizzly, Seesmic Web and others start providing a reason to consume Twitter-like streams from non-Twitter sources.

Categories: Syndication Technology

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Sunday, 28 March 2010 - Dare Obasanjo's weblog

VS

What Features and Functionality Make Up the Real-Time Web?

Bringing Real-Time to AJAX: COMET, Long Polling and soon Web Sockets

Notifications at the Speed of Light: Go Beyond Polling with PubSubHubbub

Creating and Consuming Fire Hoses: The Lynchpin of Real-Time Search

Connecting to the Stream

Example Stream

Learning More

Stats