Database normalization is a technique for designing relational database schemas that ensures that the data is optimal for ad-hoc querying and that modifications such as deletion or insertion of data does not lead to data inconsistency. Database denormalization is the process of optimizing your database for reads by creating redundant data. A consequence of denormalization is that insertions or deletions could cause data inconsistency if not uniformly applied to all redundant copies of the data within the database.

Why Denormalize Your Database?

Today, lots of Web applications have "social" features. A consequence of this is that whenever I look at content or a user in that service, there is always additional content from other users that also needs to be pulled in to page. When you visit the typical profile on a social network like Facebook or MySpace, data for all the people that are friends with that user needs to be pulled in. Or when you visit a shared bookmark on del.icio.us you need data for all the users who have tagged and bookmarked that URL as well. Performing a query across the entire user base for "all the users who are friends with Robert Scoble" or "all the users who have bookmarked this blog link" is expensive even with caching. It is orders of magnitude faster to return the data if it is precalculated and all written to the same place.

This is optimizes your reads at the cost of incurring more writes to the system. It also means that you'll end up with redundant data because there will be multiple copies of some amount of user data as we try to ensure the locality of data.

A good example of a Web application deciding to make this trade off is the recent post on the Digg Blog entitled Looking to the Future with Cassandra which contains the following excerpt

The Problem

In both models, we’re computing the intersection of two sets:

  1. Users who dugg an item.
  2. Users that have befriended the digger.

The Relational Model

The schema for this information in MySQL is:

CREATE TABLE `Diggs` (
  `id`      INT(11),
  `itemid`  INT(11),
  `userid`  INT(11),
  `digdate` DATETIME,
  PRIMARY KEY (`id`),
  KEY `user`  (`userid`),
  KEY `item`  (`itemid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;   CREATE TABLE `Friends` (
  `id`           INT(10) AUTO_INCREMENT,
  `userid`       INT(10),
  `username`     VARCHAR(15),
  `friendid`     INT(10),
  `friendname`   VARCHAR(15),
  `mutual`       TINYINT(1),
  `date_created` DATETIME,
  PRIMARY KEY                (`id`),
  UNIQUE KEY `Friend_unique` (`userid`,`friendid`),
  KEY        `Friend_friend` (`friendid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The Friends table contains many million rows, while Diggs holds hundreds of millions. Computing the intersection with a JOIN is much too slow in MySQL, so we have to do it in PHP. The steps are:

  1. Query Friends for all my friends. With a cold cache, this takes around 1.5 seconds to complete.
  2. Query Diggs for any diggs of a specific item by a user in the set of friend user IDs. This query is enormous, and looks something like:
    SELECT `digdate`, `id` FROM `Diggs`
     WHERE `userid` IN (59, 9006, 15989, 16045, 29183,
                        30220, 62511, 75212, 79006)
       AND itemid = 13084479 ORDER BY `digdate` DESC, `id` DESC LIMIT 4;

    The real query is actually much worse than this, since the IN clause contains every friend of the user, and this can balloon to hundreds of user IDs. A full query can actually clock in at 1.5kb, which is many times larger than the actual data we want. With a cold cache, this query can take 14 seconds to execute.

Of course, both queries are cached, but due to the user-specific nature of this data, it doesn’t help much.

The solution the Digg development team went with was to denormalize the data. They also went an additional step and decided that since the data was no longer being kept in a relational manner there was no point in using a traditional relational database (i.e. MySQL) and instead they migrated to a non-RDBMS technology to solve this problem.

 

How Denormalization Changes Your Application

There are a number of things to keep in mind once you choose to denormalize your data including

  1. Denormalization means data redundancy which translates to significantly increased storage costs. The fully denormalized data set from the Digg exampled ended up being 3 terabytes of information. It is typical for developers to underestimate the data bloat that occurs once data is denormalized.

  2. Fixing data inconsistency is now the job of the application. Let's say each user has a list of the user names of all of their friends. What happens when one of these users changes their user name? In a normalized database that is a simple UPDATE query to change a single piece of data and then it will be current everywhere it is shown on the site. In a denormalized database, there now has to be a mechanism for fixing up this name in all of the dozens, hundreds or thousands of places it appears. Most services that create denormalized databases have "fixup" jobs that are constantly running on the database to fix such inconsistencies.

The No-SQL Movement vs. Abusing Relational Databases for Fun & Profit

If you’re a web developer interested in building large scale applications, it doesn’t take long in reading the various best practices on getting Web applications to scale such as practicing database sharding or eschewing transactions before it begins to sound like all the advice you are getting is about ignoring or abusing the key features that define a modern relational database system. Taken to its logical extreme all you really need is a key<->value or tuple store that supports some level of query functionality and has decent persistence semantics. Thus the NoSQL movement was borne.

The No-SQL movement is a used to describe the increasing usage of non-relational databases among Web developers. This approach has initially pioneered by large scale Web companies like Facebook (Cassandra), Amazon (Dynamo) & Google (BigTable) but now is finding its way down to smaller sites like Digg. Unlike relational databases, there is a yet to be a solid technical definition of what it means for a product to be a "NoSQL" database aside from the fact that it isn't a relational database. Commonalities include lack of fixed schemas and limited support for rich querying. Below is a list of some of the more popular NoSQL databases that you can try today along with a brief description of their key qualities 

  1. CouchDB: A document-oriented database where documents can be thought of as JSON/JavaScript objects. Creation, retrieval, update and deletion (CRUD) operations are performed via a RESTful API and support ACID properties. Rich querying is handled by creating Javascript functions called "Views" which can operate on the documents in the database via Map/Reduce style queries. Usage: Although popular among the geek set most users seem to be dabblers as opposed to large scale web companies. 

  2. Cassandra: A key-value store where each key-value pair comes with a timestamp and can be grouped together into a column family (i.e. a table). There is also a notion of super columns which are columns that contain whose values are a list of other key-value pairs. Cassandra is optimized to be always writable and uses eventual consistency to deal with the conflicts that inevitably occur when a distributed system aims to be always writable yet node failure is a fact of life. Querying is available via the Cassandra Thrift API and supports fairly basic data retrieval operations based on key values and column names. Usage: Originally developed and still used at Facebook today. Digg and Rackspace are the most recent big name adopters.

  3. Voldemort: Very similar to Cassandra which is unsurprising since they are both inspired by Amazon's Dynamo. Voldemort is a key-value store where each key value pair comes with a timestamp and eventual consistency is used to address write anomalies. Values can contain a list of further key value pairs. Data access involves creation, retrieval and deletion of serialized objects whose format can be one of JSON, strings, binary BLOBs, serialized Java objects and Google Protocol Buffers. Rich querying is non-existent, simple get and put operations are all that exist.  Usage: Originally developed and still used at LinkedIn.

There are a number of other interesting NoSQL databases such as HBase, MongoDB and Dynomite but the three above seem to be the most mature from my initial analysis. In general, most of them seem to be a clone of BigTable, Dynamo or some amalgam of ideas from both papers. The most original so far has been CouchDB.

An alternative to betting on a speculative database technologies at varying levels of maturity is to misuse an existing mature relational database product. As mentioned earlier, many large scale sites use relational databases but eschew relational features such as transactions and joins to achieve scalability. Some developers have even taken that practice to an extreme and built schema-less data models on top of traditional relational database. A great example of this How FriendFeed uses MySQL to store schema-less data which is a blog post excerpted below

Lots of projects exist designed to tackle the problem storing data with flexible schemas and building new indexes on the fly (e.g., CouchDB). However, none of them seemed widely-used enough by large sites to inspire confidence. In the tests we read about and ran ourselves, none of the projects were stable or battle-tested enough for our needs (see this somewhat outdated article on CouchDB, for example). MySQL works. It doesn't corrupt data. Replication works. We understand its limitations already. We like MySQL for storage, just not RDBMS usage patterns.

After some deliberation, we decided to implement a "schema-less" storage system on top of MySQL rather than use a completely new storage system.

Our datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries). The only required property of stored entities is id, a 16-byte UUID. The rest of the entity is opaque as far as the datastore is concerned. We can change the "schema" simply by storing new properties.

In MySQL, our entities are stored in a table that looks like this:

CREATE TABLE entities (
    added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    id BINARY(16) NOT NULL,
    updated TIMESTAMP NOT NULL,
    body MEDIUMBLOB,
    UNIQUE KEY (id),
    KEY (updated)
) ENGINE=InnoDB;

The added_id column is present because InnoDB stores data rows physically in primary key order. The AUTO_INCREMENT primary key ensures new entities are written sequentially on disk after old entities, which helps for both read and write locality (new entities tend to be read more frequently than old entities since FriendFeed pages are ordered reverse-chronologically). Entity bodies are stored as zlib-compressed, pickled Python dictionaries.

Now that the FriendFeed team works at Facebook I suspect they'll end up deciding that a NoSQL database that has solved a good story around replication and fault tolerance is more amenable to solving the problem of building a schema-less database than storing key<->value pairs in a SQL database where the value is a serialized Python object.

As a Web developer it's always a good idea to know what the current practices are in the industry even if they seem a bit too crazy to adopt…yet.

Further Reading

Note Now Playing: Jay-Z - Run This Town (feat. Rihanna & Kanye West) Note


     

    Categories: Web Development

    August 26, 2009
    @ 05:44 PM

    Facebook unique user chart (2007 - 2009)

    Twitter unique user chart (2007 - 2009)

    FriendFeed unique users chart (2007 - 2009)

    With the sale of FriendFeed to Facebook for $50 million, there doesn’t seem to be much harm in talking about why FriendFeed failed to take off with mainstream audiences despite lots of hype from all of the usual corners. A good starting place is the recent blog post by Robert Scoble entitled Where’s the gang of 2,000 who controls tech hype hanging out today? where he wrote

    You see, there’s a gang of about 2,000 people who really control tech industry hype and play a major role in deciding which services get mainstream hype (this gang was all on Twitter by early 2007 — long before Oprah and Ashton and all the other mainstream celebrities, brands, and journalists showed up). I have not seen any startup succeed without getting most of these folks involved. Yes, Mike Arrington of TechCrunch is the parade leader, but he hardly controls this list. Dave Winer proved that by launching Bit.ly by showing it first to Marshall Kirkpatrick and Bit.ly raced through this list.

    By the way, having this list use your service does NOT guarantee market success. This list has all added me on Dopplr, for instance, but Dopplr has NOT broken out of this small, geeky crowd. Studying why not is something we should do.

    For the past few years, I’ve been watching services I used that were once the domain of geeks like Robert Scoble’s inner circle have eventually been adopted by mainstream users like my wife. In general, the pattern has always seemed to boil down to some combination of network effects (i.e. who do I know that is using this service?) and value proposition to the typical end user. Where a lot of services fall down is that although their value is obvious and instantly apparent to the typical Web geek, that same value is hidden or even non-existent to non-geeks. I tried the exercise of listing some of the services I’ve used that eventually got used by my wife and writing down the one or two sentence description of how I’d have explained the value proposition to here

    • Facebook – an online rolodex of your friends, family & coworkers that let’s you stay connected to what they’re up to. Also has some cool time wasting games and quizzes if your friends are boring that day.
    • Twitter – stay connected to the people you find interesting but wouldn’t or couldn’t “friend” to on Facebook (e.g. celebrities like Oprah & Ashton Kutcher or amusing sources like Sh*t My Dad Says). Also has a cool trending topics feature so you can see what people are talking about if your friends are boring that day.
    • Blogger – an online diary where you can share stories and pictures from your life with friends and family. Also a place where you can find stories and opinions from people like you when you’re boring and have nothing to write that day (Note: Blogger doesn’t actually make it easy to find blogs you might find interesting).
    • Google Reader – a way to track the blogs you read regularly once your list of blog bookmarks gets unwieldy. Also solves the problem of finding blogs you might like based on your current reading list. 

    These are four sites or technologies that I’ve used that my wife now uses ordered by how much she still uses them today. All four sites are somewhat mainstream although they may differ in popularity by an order of magnitude in some cases. Let’s compare these descriptions to those of two sites that haven’t yet broken into the mainstream but my geek friends love

    • FriendFeed – republish all of the content from the different social networking media websites you use onto this site. Also One place to stay connected to what people are saying on multiple social media sites instead of friending them on multiple sites.
    • Dopplr – social network for people who travel a lot and preferably have friends who either travel a lot or are spread out across multiple cities/countries.

    Why Dopplr isn’t mainstream should be self evident. If you’re a conference hopping geek who bounds from SXSW to MIX in the spring or the Web 2.0 summit to Le Web in the fall like Robert Scoble then a site like Dopplr makes sense especially since you likely have a bunch of friends from the conference circuit. On the other hand, if you’re the typical person who either only travels on vacation or occasionally for business then the appeal of Dopplr is lost on you.

    Similarly FriendFeed value proposition is that it is a social network for people who are on too many social networks. But even that really didn’t turn out to be how it went since Twitter ended being the dominant social network on the site and so FriendFeed was primarily a place to have conversations about what people were saying on Twitter. Thus there were really two problems with FriendFeed at the end of the day. The appeal of the service isn’t really broad (e.g. joining a 3rd social network because she has overlapping friends on Twitter & Facebook would be exacerbating the problem for my wife not solving it). Secondly, although the site ended up being primarily used as a Twitter app/conversation hub, its owners didn’t really focus on this aspect of the service which would likely have been avenue for significant growth. For what I mean, look at the graph of unique users for sites that acted as adjuncts to Twitter versus FriendFeed’s which chose not to

    There are definitely lessons to learn here for developers who are trying to figure out how to cross the chasm from enthusiastic praise from the Robert Scoble’s of the world to being used by regular non-geeks in their daily lives.


     

    Categories: Social Software

    Sam Diaz over at ZDNet wrote the following in a blog entry titled RSS: A good idea at the time but there are better ways now in response to an announcement of a new feature in Google Reader

    Once a big advocate for Google Reader, I have to admit that I haven’t logged in in weeks, maybe months. That’s not to say I’m not reading. Sometimes I feel like reading - and writing this blog - are the only things I do. But my sources of for reading material are scattered across the Web, not in one aggregated spot.

    I catch headlines on Yahoo News and Google News. I have a pretty extensive lineup of browser bookmarks to take me to sites that I scan throughout the day. Techmeme is always in one of my browser tabs so I can keep a pulse on what others in my industry are talking about. And then there are Twitter and Facebook. I actually pick up a lot of interesting reading material from people I’m following on Twitter and some friends on Facebook, with some of it becoming fodder for blog posts here.

     

    The truth of the matter is that RSS readers are a Web 1.0 tool, an aggregator of news headlines that never really caught on with the mainstream the way Twitter and Facebook have.

    I take issue with the title of Sam’s post since his complaint is really about the current generation of consumer tools for reading RSS feeds not the underlying technology itself. In general, I agree with Sam that the current generation of RSS readers have failed users and I now use pretty much the same tools that he does to catch up on blog (i.e. Twitter & Techmeme). I’ve listed some of my gripes with RSS readers including the one I wrote (RSS Bandit) in the past and will reiterate some of these points below

    1. Dave Winer was right about River of News style aggregators. A user interface where I see a stream of news and can click on the bits that interest me without doing a lot of management is superior to the using the current dominant RSS reader paradigm where I need to click on multiple folders, manage read/unread state and wade through massive walls of text I don’t want to read to get to the gems.

    2. Today’s RSS readers are a one way tool instead of a two-way tool. One of the things I like about shared links in Twitter & Facebook is that I can start or read a conversation about the story and otherwise give feedback (i.e. “like” or retweet) to the publisher of the news as part of the experience. This is where I think Sam’s comment that these are “Web 1.0” tools rings the truest. Google Reader recently added a “like” feature but it is broken in that the information about who liked one of my posts never gets back to me whereas it does when I share this post on Twitter or Facebook.

    3. As Dave McClure once ranted, it's all about the faces. The user interface of RSS readers is sterile and impersonal compared to social sites like Twitter and Facebook because of the lack of pictures/faces of the people whose words you are reading. It always makes a difference to me when I read a blog and there is a picture of the author and the same goes for just browsing a Twitter account.

    4. No good ways to separate the wheat from the chaff. As if it isn’t bad enough that you are nagged about having thousands of unread blog posts when you don’t visit your RSS reader for a few days, there isn’t a good way to get an overview of what is most interesting/pressing and then move on by marking everything as read. On the other hand, when I go to Techmeme I can always see what the current top stories are and can even go back to see what was popular on the days I didn’t visit the site. 

    5. The process of adding feeds still takes too many steps. If I see your Twitter profile and think you’re worth following, I click the “follow” button and I’m done. On the other hand, if I visit your blog there’s a multi-step process involved to adding you to my subscriptions even if I use a web-based RSS aggregator like Google Reader.

    These are the five biggest bugs in the traditional RSS reading experience today that I hope eventually get fixed since it is holding back the benefits people can get from reading blogs and/other activity streams using the open & standard infrastructure of the Web.


     

    Voting starts today for the various panel proposals for the 2010 SXSW Interactive conference. After learning a lot from participating in panels at this year’s conference, I’ve submitted two proposals for panel discussions for next years conference. Below are their descriptions and links to each panel presentation for voting

    Social Network Interop
    Portable contacts, life streaming and various ‘Connect’ offerings have begun to break down the silos and walled gardens that are social networks. Come hear a panel of experts discuss some of the technologies, design issues and future direction of this trend.

    Drinking from the activity stream when it becomes a tidal wave
    The stream is overflowing. How do you make sure the stream is still useful when there is SO MUCH getting pushed into it

    If you click through the links you’ll find a list of the seven to nine questions that will be asked and answered by the panelists. The trickiest part of this process was trying to come up with proposals six months ahead of the conference. A lot changes in six months and it was a little difficult trying to come up with panel topics that wouldn’t seem like rehashing old news by the time 2010 rolls around. At least the panel ideas aren’t as topical as discussing Facebook’s purchase of Friendfeed. :) 

    Let me know what you think of the panel ideas and who you think should be on the panels if they get accepted.


     

    Categories: Social Software

    Brad Fitzpatrick has been dropping some interesting mind bombs since starting at Google. First it was the Social Graph API recently followed by PubSubHubbub (which I need to write about one of these days) and most recently the WebFinger protocol. The underlying theme in all of these ideas is creating an open infrastructure for simplifying the tasks that are common to social networking media sites and thus improving the user experience.

    The core idea behind WebFinger is excerpted below from the project site

    If I give you my email address today, you can't do anything with it except email me. I can't attach public metadata to my email address to give you more information. WebFinger is about making email addresses more valuable, by letting people attach public metadata to them. That metadata might include:

    • public profile data
    • pointer to identity provider (e.g. OpenID server)
    • a public key
    • other services used by that email address (e.g. Flickr, Picasa, Smugmug, Twitter, Facebook, and usernames for each)
    • a URL to an avatar
    • profile data (nickname, full name, etc)
    • whether the email address is also a JID, or explicitly declare that it's NOT an email, and ONLY a JID, or any combination to disambiguate all the addresses that look like something@somewhere.com
    • or even a public declaration that the email address doesn't have public metadata, but has a pointer to an endpoint that, provided authentication, will tell you some protected metadata, depending on who you authenticate as.

    ... but rather than fight about the exact contents

    The way this is written makes it sound like this would be a useful service for end users but I think that is misleading. If you want to find out about someone you’re best of plugging their name into a search decision engine like Bing or the people search of a site like Facebook which should give you a similar or better experience today without deploying any new infrastructure on the Web.

    Where I find WebFinger to be interesting is in simplifying a lot of the common workflows that exist on the Social Web today. For example, I’ve often criticized Twitter for using the hand picked Suggested User’s List as the primary way of suggesting who you should follow instead of your social graph from a social networking site like Facebook or MySpace. However when you look at their Find People on Other Networks page it is clear that this would end up being an intimidating user experience if they listed all of the potential sources of social graphs on that page (i.e. IM services, email address books, social networking sites, etc) then asked the user to pick which ones they use.

    On the other hand, if there was a way for Twitter to know which sites I belong to just from the email address I used to signup, then there is a much smoother user experience that is possible.   

    This is a fairly boring and mundane piece of Social Web plumbing when you think about it but the ramifications if it takes off could be very powerful. Imagine what direction Twitter would have taken if it used your real social graph to suggest friends to you instead of the S.U.L. as one example. 


     

    Categories: Social Software

    June 4, 2009
    @ 04:11 PM

    I initially planned to write up some detailed thoughts on the Google Wave video and the Google Wave Federation protocol. However the combination of the fact that literally millions of people have watched the video [according to YouTube] and I’ve had enough private conversations with others that have influenced my thinking that I’d rather not post something that makes it seem like I’m taking credit for the ideas of others. That said, I thought it would still be useful to share some of the most insightful commentary I’ve seen on Google Wave from various developer blogs.

    Sam Ruby writes in his post Google Wave 

    At one level, Google Wave is clearly a bold statement that “this is the type of application that every browser should be able to run natively without needing to resort to a plugin”.  At to give Google credit, they have been working relentlessly towards that vision, addressing everything from garbage collection issues, to enabling drag and drop of photos, to providing compelling content (e.g., Google Maps, GMail, and now Google Wave).

    But stepping back a bit, the entire and much hyped HTML5 interface is just a facade.  That’s not a criticism, in fact that’s generally the way the web works.  What makes Google Wave particularly interesting is that there is an API which operates directly on the repository.  Furthermore, you can host your own server, and such servers federate using XMPP.

    These servers are not merely passive, they can actively interact with processes called “robots” using HTTP (More specifically, JSON-RPC over POST).  Once invoked, these robots have access to a full range of operations (Java, Python).  The Python library implementation looks relatively straightforward, and would be relatively easy to port to, say Ruby.

    This dichotomy pointed out by Sam is very interesting. One the one hand, there is the Google Wave web application which pushes the boundaries of what it means to be a rich web application that simply uses Javascript and the HTML DOM. This is a companion step in Google’s transition to taking an active role in the future of building Web applications where previous steps have included Google representatives drafting the HTML 5 specification, Google Gears and Google Chrome. However where things get interesting is that the API makes it possible to build alternate client applications (e.g. a .NET Wave client written in C#) and even build services that interact with users regardless of which Wave client they are using.

    Joe Gregorio has more on these APIs in his blog post Wave Protocol Thoughts where he writes

    There are actually 3 protocols and 2 APIs that are used in Wave:

    • Federation (XMPP)
    • The robot protocol (JSONRPC)
    • The gadget API (OpenSocial)
    • The wave embed API (Javascript)
    • The client-server protocol (As defined by GWT)

    The last one in that list is really nothing that needs to be, or will probably ever be documented, it is generated by GWT and when you build your own Wave client you will need to define how it talks to your Wave server. The rest of the protocols and APIs are based on existing technologies.

    The robot protocol looks very easy to use, here is the code for an admittedly simple robot. Now some people have commented that Wave reminds them of Lotus Notes, and I'm sure with a little thought you could extend that to Exchange and Groove. The difference is that the extension model with Wave is events over HTTP, which makes it language agnostic, a feature you get when you define things in terms of protocols. That is, as long as you can stand up an HTTP server and parse JSON, you can create robots for Wave, which is a huge leap forward compared to the extension models for Notes, Exchange and Groove, which are all "object" based extension models. In the "object" based extension model the application exposes "objects" that are bound to locally that you manipulate to control the application, which means that your language choices are limited to those that have bindings into that object model.

    As someone’s whose first paying job in the software industry was an internship where I had to write Outlook automation scripts to trigger special behaviors when people sent or modified Outlook task requests, I can appreciate the novelty of moving away from a programming model based on building a plugin in an application’s object model and instead building a Web service and having the web application notify you when it is time to act which is the way the Wave robot protocol works. Now that I’ve been exposed to this idea, it seems doubly weird that Google also shipped Google Apps Script within weeks of this announcement. 

    Nick Gall writes in his post My 2¢ on Google Wave: WWW is a Unidirectional Web of Published Documents -- Wave is a bidirectional Web of Instant Messages that

    Whether or not the Wave client succeeds, Wave is undoubtedly going to have a major impact on how application designers approach web applications. The analogy would be that even if Google Maps had "failed" to become the dominant map site/service, it still had major impact on web app design.

    I suspect this as well. Specifically I have doubts about the viability of the communications paradigm shift that Google Wave is trying to force taking hold. On the other hand, I’m sure there are thousands of Web developers out there right now asking themselves "would my app be better if users could see each other’s edits in real time?","should we add a playback feature to our service as well" [ed note - wikipedia could really use this] and "why don’t we support seamless drag and drop in our application?". All inspired by their exposure to Google Wave.

    Finally, I've ruminated publicly that I see a number of parallels between Google Wave and the announcement of Live Mesh. The one interesting parallel worth calling out is that both products/visions/platforms are most powerful when there is a world of different providers each exposing their data types to one or more of these rich user applications (i.e. a Mesh client or Wave client). Thus far I think Google has done a better job than we did with Live Mesh in being very upfront about this realization and evangelizing to developers that they participate as providers. Of course, the proof will be in the pudding in a year or so when we see just how many services have what it takes to implement a truly interoperable federated provider model for Google Wave.

    Note Now Playing: Eminem - Underground/Ken Kaniff Note
     

    Categories: Platforms | Web Development

    In between watching the Google Wave video and Slumdog Millionaire, I got around to completing the first set of tabs for the ribbon in RSS Bandit. Screenshots are below, as usual let me know what you think.

    Fig 1: The home tab. This is the default tab on launching the application. I like that formerly hidden features of the application like subscribing to newsgroups and managing podcasts are now front and center without having to compromise on the common tasks that people want to perform.

    Fig 2: The ability to synchronize RSS Bandit with your Google Reader or NewsGator Online feeds is also now a lot more discoverable instead of being hidden in some obscure menu with an obscure name ("Synchronize Feeds"). 

    Fig 3: The folder tab. This is menu is contextual and becomes selected when you click on a folder in the tree view. There are two features I’d like to call out in this view; Rules and Filters.

    Fig 4: The rules tool is where we’ll end up placing existing and new options on behavior the user would like executed on receipt or viewing of new content.

    Fig 5: The filter tool is used for filtering the items that show up in the list view. We've had several requests for this feature over the past few years but couldn’t figure out an elegant way to incorporate it into the user interface.

    Fig 6: The feed tab. This is a contextual tab that is selected when you click on a feed in the tree view. One feature that I love which is now properly highlighted is that we support creating new posts in feeds that support this such as newsgroups (existing feature) or posting a new status update on Facebook if you have hooked it up as a feed source (new feature).

    Fig 7: The item tab. This is the contextual tab that is highlighted when you select an item in the list view. There are no new features highlighted here. What we do think will be interesting is if we make it straightforward for existing and new IBlogExtension plugins to end up showing up in the item tab. So you should think of this tab as being extensible and should expect that some of our existing plugins (e.g. "Email This", "Post to Twitter", etc) will also end up in this tab.


     

    Categories: RSS Bandit

    A few days ago, Jeff Atwood responded to one of my status messages on Twitter with the following response of his own

    r @carnage4life you keep saying that, and yet that doesn't make it true. Twitter is Facebook without all the annoying bullshit on top 

    This is a good opportunity to talk about what Twitter brings to the table as a social software application (as opposed to the Twitter as Google Killer meme). Twitter currently positions itself as a microblogging platform which is implies that it’s like blogging just smaller. A blog is often two things, first of all it’s about personal publishing platform for one or more people to share their opinions and knowledge with the world. The second thing is that it is the community of people who read that blog and the conversations they have about it on the site. The second is usually embodied by comments on the blog. In fact some, like Jeff Atwood, have argued that a blog without comments isn’t really a blog. As Jeff writes

    I firmly maintain that a blog without comments enabled is not a blog. It's more like a church pulpit. You preach the word, and the audience passively receives your evangelical message. Straight from God's lips to their ears. When the sermon is over, the audience shuffles out of the church, inspired for another week. And there's definitely no question and answer period afterward.

    the church pulpit

    Of course, I'm exaggerating for comedic effect. Maybe a blog with comments disabled is more analogous to a newspaper editorial. But even with a newspaper editorial, readers can make public comments by sending a letter to the editor, which may be published in a later edition of the paper.

    When you look at a blog such as Mashable and compare it its Twitter counterpart or even Jeff Atwood’s blog versus his Twitter account, it seems clear which is more of church pulpit where the audience passively receives your evangelical message versus a forum for two way communication between the audience and the author.

    An interesting dynamic that Twitter has added to personal publishing that doesn’t have a good analog in blogging is the notion of a public list of subscribers to the publisher’s content with links to every one of them and a fairly pejorative name for them  “followers”.  This feature has led to both micro and macro celebrities engaging in games to see who can amass the most fans with the most notable public display being the race between Ashton Kutcher and CNN to a million followers.

    Twitter takes blogging to the next level as a platform for building and encouraging celebrity. The other side of this is poignantly captured in James Governor’s post A truth of Asymmetric Follow: On sadness, fans and fantasy 

    Well last week I had a chance to walk in the fan’s shoes, and of course I learned a lot, while trying to build buzz for our charitable efforts for Red Nose Day. I have to admit I hated it. I *really* wanted to get the attention of @wossy or @stephenfry. Could I? Of course not. These guys have day jobs…

    But it was only on spending a lot of time surfing around user profiles to check for spambots that I discovered how profoundly depressing the celebrities on Twitter phenomenon can be. It was coming across profiles of Twitter users following ten or so celebrities on Twitter (and nobody else), wondering why their questions weren’t being answered. Why are they ignoring me, I keep asking them questions? After I saw a few of these profiles I felt a little depressed.

    From this perspective it is unsurprising that tech celebrities like Jeff Atwood & Robert Scoble and real-world celebrities like Ashton Kutcher & John Mayer love the Twitter dynamic. Similarly, it is also unsurprising that over 60% of users abandon the service within the first month. After all, we aren’t all celebrities.

    In its current form, Twitter is growing primarily as a platform for celebrities, wannabe celebrities and their fans. The key thing to note is that celebrity here isn’t limited to the kind of people you read about in People magazine and US Weekly. For example, I use Twitter to follow web technology celebrities like Tim O'Reilly and Scott Hanselman. On the other hand, my wife uses Twitter follows popular mommy bloggers like McMommy and Playground for Parents.

    Going back to Jeff Atwood’s twitter message, I don’t consider Twitter to be Facebook with the annoying bullshit stripped out. For the most part, the Facebook experience has focused on being away to bring your offline relationships to the web. This is captured in the current home page design which proclaims that Facebook helps you connect and share with the people in your life.

    From my perspective, this goal has more widespread appeal and utility than being a next generation platform for celebrity on the Web. Your mileage may vary.

    Note Now Playing: Kid Cudi - Day N Nite (remix) (feat. Jim Jones & Trey Songz) Note


     

    Categories: Social Software

    I’ve made some more progress in integrating the Facebook news feed into the next version of RSS Bandit currently codenamed Colossus. This weekend I completed the addition of support for viewing and replying to comments in the news feed. So here are some screenshots of current comment workflow for interacting with Facebook comments

    Fig 1: Viewing the comments in response to a funny status update from Anil Dash 

    Fig 2: Responding to the comment by pressing "Ctrl + R" or right-clicking and selecting Post Reply.

    Fig 3: The news feed on Facebook with the comment posted from RSS Bandit


    The second major change coming in the Colossus release is the adoption of the design elements from the Microsoft Office fluent user interface such as the ribbon, contextual tabs, galleries and live preview. To prepare for this change, we’re first building a prototype of the redesigned user interface and once we’re happy with it we will start refactoring the RSS Bandit application to enable swapping out our existing menus and taskbars with the new interface.

    Here’s where we are in the design prototype for next release. Let me know what you think in the comments.

     


     

    Categories: RSS Bandit

    May 22, 2009
    @ 02:54 PM

    In the past week or so, two of the biggest perception problems preventing proliferation of OpenID as the de facto standard for decentralized identity on the Web have been addressed. The first perception problem is around the issue of usability. I remember attending the Social Graph Foo Camp last year and chatting with a Yahoo! employee about why they hadn’t become an Open ID relying party (i.e. enable people to login to Yahoo! account with OpenIDs). The response was that they had concerns about the usability of OpenID causing reducing the number of successful log-ins given that it takes the user off the Yahoo! sign-in page to an often confusing and poorly designed page created by a third party.

    Last year’s launch and eventually success of Facebook Connect showed developers that it is possible to build a delegated identity workflow that isn’t as intimidating and counterproductive as the experience typically associated with delegated identity systems like OpenID. On May 14th, Google announced that a similar experience has now been successfully designed and implemented for OpenID in the Google Code blog post titled Google OpenID API - taking the next steps which states

    We are happy to announce today two new enhancements to our API - introducing a new popup style UI for our user facing approval page, and extending our Attribute Exchange support to include first and last name, country and preferred language.

    The new popup style UI, which implements the

    OpenID User Interface Extension Specification, is designed to streamline the federated login experience for users. Specifically, it's designed to ensure that the context of the Relying Party website is always available and visible, even in the extreme case where a confused user closes the Google approval window. JanRain, a provider of OpenID solutions, is an early adopter of the new API, and already offers it as part of their RPX product. As demonstrated by UserVoice using JanRain's RPX, the initial step on the sign-in page of the Relying Party website is identical to that of the "full page" version, and does not require any changes in the Relying Party UI.

    Once the user selects to sign in using his or her Google Account, the Google approval page is displayed. However, it does not replace the Relying Party's page in the main browser window. Instead it is displayed as a popup window on top of it. We have updated our Open Source project to include a complete Relying Party example, providing code for both the back-end (in Java) and front-end (javascript) components.

    Once the user approves the request, the popup page closes, and the user is signed in to the Relying Party website.

    The aforementioned OpenID User Interface Extension allows the relying party to request that the OpenID provider authenticate the user via a “pop up” instead of through navigating to their page and then redirecting the user back to the relying party’s site. Thus claim that OpenID usability harms the login experience is now effectively addressed and I expect to see more OpenID providers and relying parties adopt this new popup style experience as part of the authentication process.

    The second biggest perception blocker is the one asked in articles like Is OpenID Being Exploited By The Big Internet Companies? which points out that no large web companies actually support OpenID as a way to login to their primary services. The implication being that companies are interested in using OpenID as a way to spread their reach across the web including becoming identity providers for other companies but don’t want others to do the same to them.

    That was true until earlier this week when Luke Shepard announced Facebook Supports OpenID for Automatic Login. Specifically,

    Now, users can register for Facebook using their Gmail accounts. This is a quicker, more streamlined way for new users to register for the site, find their friends, and start exploring.

    Existing and new users can now link their Facebook accounts with their Gmail accounts or with accounts from those OpenID providers that support automatic login. Once a user links his or her account with a Gmail address or an OpenID URL, logs in to that account, then goes to Facebook, that user will already be logged in to Facebook.

    In tests we've run, we've noticed that first-time users who register on the site with OpenID are more likely to become active Facebook users. They get up and running after registering even faster than before, find their friends easily, and quickly engage on the site.

    This makes Facebook the first major web company to truly embrace OpenID as a way to enable users to sign up and login to the site using credentials from a third party (a competitor even). The fact that they also state that contrary to popular perception this actually improves the level of engagement of those users is also a big deal.

    Given both of these events, I expect that we’ll see a number of more prominent sites adopting OpenID as they now clearly have nothing to lose and a lot to gain by doing so. This will turn out to be a great thing for users of the web and will bring us closer to the nirvana that is true interoperability across the social networking and social media sites on the web.


     

    Categories: Web Development