About two weeks ago Chris Messina wrote a post titled OpenID Connect where he argued for the existence of a Facebook Connect style technology build on OpenID. He describes the technology as follows

So, to summarize:

  • for the non-tech, uninitiated audiences: OpenID Connect is a technology that lets you use an account that you already have to sign up, sign in, and bring your profile, contacts, data, and activities with you to any compatible site on the web.
  • for techies: OpenID Connect is OpenID rewritten on top of OAuth WRAP using service discovery to advertise Portable Contacts, Activity Streams, and any other well known API endpoints, and a means to automatically bootstrap consumer registration and token issuance.

This is something I brought up over a year ago in my post Some Thoughts on OpenID vs. Facebook Connect. The fact is that OpenID by itself is simply not as useful as Facebook Connect. The former allows me to sign-in to participating sites with my existing credentials while the latter lets me sign-in, share content with my social network, personalize and find my friends on participating sites using my Facebook identity.

As I mentioned in my previous post there are many pieces of different “Open brand” technologies that can be pieced together to create something similar to Facebook Connect such as OpenID + OpenID Attribute Exchange + Portable Contacts + OAuth WRAP + Activity Streams. However no one has put together a coherent package that ties all of these together as a complete end-to-end solution. This isn’t helped by the fact that these specs are at varying levels of maturity and completion.

One of the reasons this hasn’t happened is for a reason I failed to anticipate. Back in late 2008, I assumed we would see lots of competitors to Facebook Connect. This hasn’t truly materialized. Google Friend Connect has turned out to be an interesting combination of OpenID sign-in and the ability to add “social” widgets to your site but not about integrating with Google’s social networking services in a deep way (probably because Google doesn’t have any?). MySpaceID has failed to gain traction and lacks key features of Facebook Connect such as being able to publish rich activities from a 3rd party site to MySpace. And that’s it. Those two technologies are the closest to Facebook Connect from a major online player and they fall far short.

So do we need an OpenID Connect? We would if there were lots of Facebook Connect style offerings that significantly differed in implementation. However there aren’t. One could argue that perhaps the reason we don’t have many is that there are no standards that guide websites on what to implement. However this sounds like using “standards” for inventing technologies instead of standardizing best practice. I’ve always considered this questionable from my days working with XML technologies XML Schema, SOAP and WSDL.

If you got together to create an OpenID Connect now, the best you could do is come up with a knock off of Facebook Connect using “Open brand” technologies since that’s the only example we have to work off of. That’s great until Facebook Connect adds more features or websites finally wrap their heads around this problem space and actually start competing with Facebook Connect. Premature standardization hinders instead of helps.

Although we might need OpenID Connect someday, that day isn’t today.

Note Now Playing: Ke$ha - TiK ToK Note


Marshall Kirkpatrick has a post entitled Facebook's Zuckerberg Says The Age of Privacy is Over where he reviews some quotes by Mark Zuckerburg, the founder of Facebook, on their recent privacy changes and how these changes are reflecting evolving social norms. Below is an excerpt on Marshall's take on Mark Zuckerburg's comments

Facebook allows everyday people to share the minutiae of their daily lives with trusted friends and family, to easily distribute photos and videos - if you use it regularly you know how it has made a very real impact on families and social groups that used to communicate very infrequently. Accessible social networking technology changes communication between people in a way similar to if not as intensely as the introduction of the telephone and the printing press. It changes the fabric of peoples' lives together. 350 million people signed up for Facebook under the belief their information could be shared just between trusted friends. Now the company says that's old news, that people are changing. I don't believe it.

I think Facebook is just saying that because that's what it wants to be true.

There's lots of food for thought here. At first I wondered whether Facebook would have become the global phenomenon that is today where your friends, neighbors, coworkers and old school chums are sharing the minutiae of their lives with you if it had been public by default. Then I realized that sort of thinking doesn't matter since Facebook has 350 million users today so wondering how things could have turned out years ago with a different design isn't particularly interesting.  

What is interesting is considering why Facebook would want it to be true that many of their users think nothing of making their Facebook data public versus sharing it within their social network? The simple answer is Twitter.

Below is the Google Trends chart showing the difference in traffic between both sites.

In looking at the above chart, one might think it ludicrous that Facebook would have anything to fear from Twitter given that it has at least an order of magnitude more users. However compare the above chart to a comparison of news references and search queries for the phrases "search twitter" versus "search Facebook".

There are two things you learn from the above chart. The first is that the news media is a lot more interested in talking about search and Twitter instead of search and Facebook. This implies that even though Facebook has similar features to Twitter and ten times the user base, people don't talk about the power of being able to search Facebook status updates like they do about Twitter. The second is that there actually more interest from people actually doing search queries in searching content on Facebook than in searching Twitter content which is unsurprising since Facebook has a lot more users.

However the fact that status updates and other content on Facebook is private by default means Facebook cannot participate in this space even though it has the same kind of content that Twitter does but it is more valuable because they have lots more content and it is backed by real identities not anonymous users. Here's a quick list of the top of my head of the kinds of apps you can enable over Twitter's public stream of status updates that Facebook was locked out of until their privacy change

  1. What The Trend – Lists topics that are currently trending on Twitter and why. Often a quick way to find breaking news before it is reported by the mainstream media.
  2. TweetmemeThe top links that are currently being shared on Twitter. Another source of breaking news and cool content. It's like Digg and Reddit but without having to vote on content on some geeky "social news" site.
  3. Bitly.TVA place to watch the videos that are currently being shared on Twitter.
  4. Twittervision – A cool way to idle away the minutes by seeing what people all over the world are saying on Twirter.
  5. Google Real-Time search – See what Twitter users are saying about a particular search term in real-time as part of your search results
  6. Filtrbox – A tool that enables companies to see what their customers are saying about their products and brands on Twitter

All of these and more are the kinds of scenario Facebook could enable if their status update streams are public instead of private. People think Twitter is worth $1 billion because it is sitting on this well of real-time status updates and has created this ecosystem of services that live of its stream. However Facebook is sitting on ten times as much data yet could not be a part of this world because of their history of being a privacy centered social network. Being able to participate in the real-time search increases Facebook value and broadens its reach across the Web. With the privacy changes in place this will now be the case. Especially since 50 percent of their users have accepted the more public default privacy settings. Facebook can now participate in the same real-time ecosystem as Twitter and will bring more content that is easier to trust since it comes from people's real identities.

That said, I commend the people at Facebook for having the courage to evolve their product in the face of new market opportunities instead of being tied to their past. Lots of companies let themselves be ruled by fear and thus stick to the status quo for fear of ticking off their users which often leads to bored users. Kudos.  

Note Now Playing: Flobots - Handlebars Note


Categories: Social Software

Recently I came across two blogs I thought were interesting and would love to follow regularly; Chris Dixon's blog and the Inside Windows Live blog. What surprised me was that my first instinct was to see if they were on Twitter instead of adding their RSS feeds to my favorite RSS reader. I thought this was interesting and decided to analyze my internal thought process that led me to preferring following blogs via Twitter instead of consuming the RSS feeds in Google Reader + RSS Bandit.

I realized it comes down to two things, one I’ve mentioned before and the second which dawned on me recently

  1. The first problem is that the user experience around consuming feeds in traditional RSS readers which take their design cues from email readers is all sorts of wrong. I’ve written about this previously in my post The Top 5 Reasons RSS Readers Went Wrong. Treating every blog post as important enough that I have view the entire content and explicitly mark it as read is wrong. Not providing a consistent mechanism to give the author feedback or easily reshare the content is archaic in today’s world. And so on.

  2. The mobile experience for consuming Twitter streams is all sorts of awesome. I currently use Echofon to consume Twitter on my phone and have used Twitterific which is also excellent. I’ve also heard people say lots of good things about Tweetie. On the other hand, I haven’t found a great mobile application for consuming RSS feeds on my mobile phone which may be a consequence of #1 above.

So I’ve been thinking about how to make my RSS experience more like my Twitter experience given that not all the blogs I read are on Twitter or will ever be on the service. At first I flirted with building a tool that automatically creates a Twitter account for a given RSS feed but backed away from that when I remembered that the Twitter team hates people using it as a platform for rebroadcasting RSS feeds.

I realized that what I really need is a Twitter applicationthat also understands RSS feeds and shows them in the same stream. In addition, I may have been fine with this being a new app on the Web but don’t want to lose the existing Twitter clients on my mobile phone. So I really want a web app that shows me a merged Twitter/RSS streams and that exposes the Twitter API so I can point apps like Echofon/Twitterific/Tweetie at it.

As I thought about which web app could be closest to doing this today I landed on Brizzly and Seesmic Web. These sites are currently slightly different web interfaces to to the Twitter service which [at least to me] currently haven’t provided enough value above and beyond the Twitter website for me to use on a regular basis. Being able to consume both my RSS feeds and my Twitter stream on such services would not only serve as a differentiator between them and other Twitter web clients but would also be functionality that Twitter wouldn’t be able to make obsolete given their stated dislike of RSS content on their service.

I’d write something myself except that I doubt that the authors of Twitter mobile apps will be interested in making it easy to consume a Twitter stream from sites other than http://www.twitter.com unless lots of their users ask for this feature which will only happen if services like Brizzly, Seesmic Web and others start providing a reason to consume Twitter-like streams from non-Twitter sources. 


A couple of weeks ago Paul Adams, a user experience researcher at Google, wrote a post titled Why “Liking” is about more than just liking which contained the following insight

Why do people ‘like’ things on social networks?

It would be easy for us to assume that it is because they liked the content. But it is a bit more complicated than that. It’s a combination of the content, and the person who posted it.

People sometimes ‘like’ content, not because they actually like it, but because they want a lightweight way of building their relationship with the other person. It’s similar to being in a group, maybe in a bar or cafe, and there is someone there that you’d like to get to know better. They tell a joke that isn’t very funny - but you laugh that extra bit louder, and grab a bit of eye contact, just to build that relationship

What this means: Just because someone ‘liked’ a YouTube video about Budweiser, that doesn’t mean that they’ll respond positively to Budweiser advertising. It also doesn’t mean that they want to become a member of the Budweiser fan page. In fact, they may dislike Budweiser, but like the person who shared the video. By targeting Budweiser ads, you may do more damage to the brand than good. When targeting advertising on social networks, mining content in the absence of understanding the people relationships is a risky strategy.

I agree that in the context of Facebook, liking a status update or shared link is often just as much about phatic communication as it is about the content that is being shared. In the example from the screenshot, the people who liked the item aren’t saying they like the key terms in the status updat(i.e. hospitality, Tokyo, Japanese, etc) but instead are showing interest in the poster’s news from their trip abroad. When considered, the fact is that the work like actually harms the feature’s use since I’ve seen people want to show some sign of support by “liking” an item on Facebook but then shied away when considering what the word actually means. For example, I was recently a victim of identity theft and I know someone who almost clicked the “like” button as a show of support until he realized he didn’t want people to think he actually liked the fact that I was a victim of fraud.

However Facebook’s isn’t the only model for users in a social media application to show their appreciation for the status updates of others. Both FriendFeed’s like feature and Twitter’s retweet also provide a mechanism for showing one’s interest in another’s status update but also have the side effect of sharing this update with your friends as well. On both of these services, a user clicking on Like/Retweet often means they are interested in the content they are sharing not just engaging in social niceties with a friend online.

In other words, although it may not make sense for Facebook to target against ads to you based on the content of the status updates you’ve liked, it may actually make sense for Twitter or Twitter apps to target ads based on the content you’ve retweeted. 


Categories: Social Software

There are a couple of posts written this past weekend about services beginning to expose their services using the Twitter API and how this marks the rise of Twitter as a de facto standard for use in microblogging (or whatever we're calling it these days).

The first post I was on this topic was from Fred Wilson in his post Open APIs and Open Standards where he writes

As Dave Winer has been pointing out in recent weeks, there is something quite interesting happening in the blogging/microblogging world.

First WordPress allowed posting and reading wordpress blogs via the Twitter API.

Then yesterday our portfolio company Tumblr did the same.

John Borthwick has been advising companies for a while now to build APIs that mimic the Twitter API. His reasoning is that if your API look and feels similar to the Twitter API then third party developers will have an easier time adopting it and building to it. Makes sense to me.

But what Wordpress and Tumblr have done is a step farther than mimicing the API. They have effectively usurped it for their own blogging platforms. In the case of Tumblr, they are even replicating key pieces of their functionality in it

Anil Dash quickly followed up by declaring The Twitter API is Finished. Now What? and stating

Twitter's API has spawned over 50,000 applications that connect to it, taking the promise of fertile APIs we first saw with Flickr half a decade ago and bringing it to new heights. Now, the first meaningful efforts to support Twitter's API on other services mark the maturation of the API as a de facto industry standard and herald the end of its period of rapid fundamental iteration.

From here, we're going to see a flourishing of support for the Twitter API across the web, meaning that the Twitter API is finished. Not kaput, complete. If two companies with a significant number of users that share no investors or board members both support a common API, we can say that the API has reached Version 1.0 and is safe to base your work on. So now what?

This is a pattern that repeats itself regularly in the software industry; companies roll their own proprietary APIs or data formats in a burgeoning space until one or two leaders emerge and then the rest of the industry quickly wants to crown a winning data format or API to prevent Betamax vs. VHS style incompatibility woes for customers and developers.

Given that this is a common pattern, what can we expect in this instance? There are typically two expected outcomes when such clamoring for a company's proprietary platform or data format to become the property reaches a fever pitch. The first outcome is similar to what Anil Dash and Fred Wilson have described. Some competitors or related companies adopt the format or API as is to take advantage of the ecosystem that has sprung up around the winning platform. This basically puts the company (Twitter in this case) in a spot where they either have to freeze the API or bear the barbs from the community if they ever try to improve the API in a backwards incompatible way.

The problem with freezing the API is that once it becomes a de facto standard all sorts of folks will show up demanding that it do more than it was originally expected to do since they can't ship their own proprietary solutions now that there is a "standard". This is basically what happened during the RSS vs. Atom days where Dave Winer declared that RSS is Frozen. What ended up happening was that there were a lot of people who felt that RSS and it's sister specifications such as the MetaWeblog API were not the final word in syndicating and managing content on the Web. Dave Winer stuck to his guns and people were left with no choice but to create a conflicting de jure standard to compete with the de facto standard that was RSS. So Atom vs. RSS became the XML syndication world's Betamax vs. VHS or Blu-Ray vs. HD-DVD. As a simple thought experiment, what happens if Twitter goes along with the idea that their API is some sort of de facto standard API for microcontent delivered in real-time streams. What happens when a company like Facebook decides to adopt this API but needs to API to be expanded because it doesn't support their features? And that they need the API to be constantly updated since they add new features on Facebook at a fairly rapid clip? Amusingly enough there are already people preemptively flaming Facebook for not abandoning their API and adopting Twitter's even though it is quite clear to any observer that Facebook's API predates Twitter's, has more functionality and is supported by more applications & websites.

Things get even more interesting if Facebook actually did decide to create their own fork or "profile" of the Twitter API due to community pressure to support their scenarios. Given how this has gone down in the past such as the conflict between Dave Winer and the RSS Advisory board or more recently Eran Hammer-Lahav's strong negative reaction to the creation of OAuth WRAP which he viewed as a competitor to OAuth, it is quite likely that a world where Facebook or someone else with more features than Twitter decided to adopt Twitter's API wouldn't necessarily lead to everyone singing Kumbaya.

Let's say Twitter decides to take the alternate road and ignores this hubbub since the last thing a fast moving startup needs is to have their hands tied by a bunch of competitors telling them they can't innovate in their API or platform any longer. What happens the first time they decide to break their API or even worse deprecate it because it no longer meets their needs? That isn't far fetched. Google deprecated the Blogger API in favor of GData (based on the Atom Publishing Protocol) even though Dave Winer and a bunch of others had created a de facto standard around a flavor of the API called the MetaWeblog API. About two weeks ago Facebook confirmed that they were deprecating a number of APIs used for interacting with the news feed. What happens to all the applications that considered these APIs to be set in stone? It is a big risk to bet on a company's platform plans even when they plan to support developers let alone doing so as a consequence of a bunch of the company's competitors deciding that they want to tap into its developer ecosystem instead of growing their own.

The bottom line is that it isn't as simple as saying "Twitter is popular and it's API is supported by lots of apps so everyone needs to implement their API on their web site as well". There are lots of ways to create standards. Crowning a company's proprietary platform as king without their participation or discussion in an open forum is probably the worst possible way to do so.

Note Now Playing: Eminem - Hell Breaks Loose Note


From the Facebook blog post Updates on Your New Privacy Tools

Can I limit access to my Friend List?

Many of you have mentioned that you want a way to hide your list of friends. In response to your feedback, we've removed the "View Friends" link from search results, making your Friend List less visible on the site.

In addition, you can further limit the visibility of your Friend List to other people on Facebook if you want. After you've completed the transition to the new privacy settings, you'll be able to click on the pencil icon in the top-right corner of the "Friends" box on your profile. Unchecking "Show my friends on my profile" will prevent your Friend List from appearing in your profile when it is viewed by people who are logged in to Facebook. Keep in mind, however, that because Friend List is publicly available, it will be visible to people who are viewing your profile while not logged in. Again, you will only have this option once you've completed the transition to the new privacy settings.

Remember, you can also limit who can find you in searches on Facebook and control whether your information can be indexed by public search engines under "Search" on the Privacy Settings page.

That's awesome. I didn't realize when I joined Facebook that the service would retroactively decide that my list of friends was public knowledge and then would add a privacy setting to "hide" it from Facebook users that could be worked around by logging out. Join me as I say goodbye to my old privacy settings and old public version of my Facebook profile which kept my private information private.

R.I.P. Old Facebook Privacy settings

R.I.P. Old Facebook Public Profile


Categories: Social Software

I've been a Twitter user for almost two years now and I have always been impressed by the emergent behavior that has developed from simply giving people a text box with 140 character limit. The folks at Twitter have also done a good job of noticing some these emergent behaviors and making them formal features of the site. Both hashtags and @replies are examples of emergent community conventions in authoring tweets that are now formal features of the site.

Twitter recently added retweets to this list with Project Retweet. After using this feature for a few days I've found that unlike hashtags and @replies, the way this feature has been integrated into the Twitter experience is deeply flawed. Before I talking about the problems with Project Retweet, I should talk about how the community uses retweeting today.

Retweeting 101: What is it and why do people do it?

Retweeting is akin to the practice of forwarding along interesting blog posts and links to your friends via email. A retweet repeats the content of a person's tweet (sometimes edited for brevity) along with a reference to the user who is being retweeted. Often times people also add some commentary to the retweets. Examples of both styles of retweets are shown below.

Figure 1: Retweet without commentary

Figure 2: Retweet with added comment

Unlike hashtags and @replies, the community conventions aren't as consistent with retweets. Below are two examples of retweets from my home page which use different prefixes and separators from the one above to indicate the item is a retweet and the user's comment respectively.

Figure 3: Different conventions in retweeting

However there are many issues with retweeting not being a formal feature of Twitter. For one, it is often hard for new users to figure out what's going on when they see people posting updates prefixed with strange symbols and abbreviations. Another problem is that users who want to post a retweet now have to deal with the fact that the original tweet may have taken up all or most of the 140 character limit so there may be little room to credit the author let alone add commentary.

Thus I was looking forward to retweeting becoming a formal feature of Twitter so that these problems would be addressed. Unfortunately, while one of these problems was fixed more problems were introduced.

Flaw #1: Need to visit multiple places to see all retweets of your content

Before the introduction of the retweet feature, users could go to http://www.twitter.com/replies to see all posts that reference their name which would include @replies and retweets. The new Twitter features fragments this in an inconsistent manner.

Figure 4: Current Twitter sidebar

Now users have to visit http://www.twitter.com/replies to see people who has retweeted their posts using community conventions (i.e. copy and pasting then prefixing "RT" to a tweet) and then visit http://twitter.com/retweeted_of_mine to see who has retweeted their posts by clicking the Retweet link in the Twitter web user interface. There will be different people in both lists.

Figure 5: Retweets in the Replies/Mentions page

Figure 6: Retweets on the "Your tweets, retweeted" page

It is surprising to me that Twitter didn't at least include posts that start with RT followed by your username in http://twitter.com/retweeted_of_mine as well.

Flaw #2: No way to add commentary on what you are retweeting

As I mentioned earlier, it is fairly common for people to retweet a status update and then add their own commentary. The retweet feature built into Twitter ignores this common usage pattern and provides no option to add your own commentary.

Figure 7: The Retweet prompt

This omission is particularly problematic if you disagree with what you are sharing and want to clarify to your followers that although you find the tweet interesting you aren't endorsing the opinion. 

Flaw #3: Retweets don't show up in Twitter apps

One of the other surprising changes is that Twitter retweets have been introduced in a backwards-incompatible manner into the API. This means that retweets created using the Twitter retweet button do not show up in 3rd party applications that use the Twitter API. See below for an example of what I see in Echofon versus the Twitter web experience and notice the missing tweet.

Figure 8: Twitter website showing a retweet

Figure 9: The retweet is missing in Echofon

Again, I find this surprising since it would have been straightforward to keep retweets in the API and exposing them as if they were regular old school retweets prefixed with "RT".

Flaw #4: Pictures of people I don't know in my stream

The last major problem with the Twitter retweet feature is that it breaks user expectation of the stream. Until this feature shipped, users could rest assured that the only content they saw in their stream was content they had explicitly asked for by subscribing to a user. Thus when you see someone in your stream the person's user name and avatar are familiar to you.

With the new retweet feature, the Twitter team has decided to highlight the person being retweeted and treat the person who I've subscribed to that did the retweeting as an afterthought. Not only does this confuse users at first (who is this person showing up in my feed and why?) but it also assumes that the content being retweeted is more important than who did the retweeting. This is an unfortunate assumption since in many cases the person who did the retweeting adds all the context.

Note Now Playing: Jason Derulo - Whatcha Say Note


Categories: Rants | Social Software

In the past few months I've noticed an increased number of posts questioning practices around deleting and "virtually" deleting data from databases. Since some of the concerns around this practice have to do with the impact of soft deletes on scalability of a database-based application, I thought it would be a good topic for my ongoing series on building scalable databases.

Soft Deletes 101: What is a soft delete and how does it differ from a hard delete?

Soft deleting an item from a database means that the row or entity is marked as deleted but not physically removed from the database. Instead it is hidden from normal users of the system but may be accessible by database or system administrators.

For example, let's consider this sample database of XBox 360 games I own

Name Category ESRB GamespotScore Company
Call of Duty: Modern Warfare 2 First Person Shooter Mature 9.0 Infinity Ward
Batman: Arkham Asylum Fantasy Action Adventure Teen 9.0 Rocksteady Studios
Gears of War 2 Sci-Fi Shooter Mature 9.0 Epic Games
Call of Duty 4: Modern Warfare First Person Shooter Mature 9.0 Infinity Ward
Soul Calibur IV 3D Fighting Teen 8.5 Namco

Now consider what happens if I decide that I'm done with Call of Duty 4: Modern Warfare now that I own Call of Duty: Modern Warfare 2. The expected thing to do would then be to remove the entry from my database using a query such as

DELETE FROM games WHERE name='Call of Duty 4: Modern Warfare';

This is what is considered a "hard" delete.

But then what happens if my friends decide to use my list of games to decide which games to get me for Christmas? A friend might not realize I'd previously owned the game and might get it for me again. Thus it might be preferable if instead of deleting items from the database they were removed from consideration as games I currently own but still could be retrieved in special situations. To address this scenario I'd add an IsDeleted column as shown below

Name Category ESRB GamespotScore Company IsDeleted
Call of Duty: Modern Warfare 2 First Person Shooter Mature 9.0 Infinity Ward False
Batman: Arkham Asylum Fantasy Action Adventure Teen 9.0 Rocksteady Studios False
Gears of War 2 Sci-Fi Shooter Mature 9.0 Epic Games False
Call of Duty 4: Modern Warfare First Person Shooter Mature 9.0 Infinity Ward True
Soul Calibur IV 3D Fighting Teen 8.5 Namco False

Then for typical uses an application would interact with the following view of the underlying table

CREATE VIEW current_games AS
SELECT Name, Category, ESRB, GameSpotScore, Company FROM games WHERE IsDeleted=False;

but when my friends ask me for a list of all of the games I have, I can provide the full list of all the games I've ever owned from the original games table if needed. Now that we understand how one would use soft deletes we can discuss the arguments against this practice.

Rationale for War: The argument against soft deletes

Ayende Rahien makes a cogent argument against soft deletes in his post Avoid Soft Deletes where he writes

One of the annoyances that we have to deal when building enterprise applications is the requirement that no data shall be lost. The usual response to that is to introduce a WasDeleted or an IsActive column in the database and implement deletes as an update that would set that flag.

Simple, easy to understand, quick to implement and explain.

It is also, quite often, wrong.

The problem is that deletion of a row or an entity is rarely a simple event. It effect not only the data in the model, but also the shape of the model. That is why we have foreign keys, to ensure that we don’t end up with Order Lines that don’t have a parent Order. And that is just the simplest of issues.
Let us say that we want to delete an order. What should we do? That is a business decision, actually. But it is one that is enforced by the DB itself, keeping the data integrity.

When we are dealing with soft deletes, it is easy to get into situations where we have, for all intents and purposes, corrupt data, because Customer’s LastOrder (which is just a tiny optimization that no one thought about) now points to a soft deleted order.

Ayende is right that adding an IsDeleted flag mean that you can no longer take advantage of database triggers for use when cleaning up database state when a deletion occurs. This sort of cleanup now has to moved up into the application layer.

There is another set of arguments against soft deletes in Richard Dingwall's post entitled The Trouble with Soft Delete where he points out the following problems


To prevent mixing active and inactive data in results, all queries must be made aware of the soft delete columns so they can explicitly exclude them. It’s like a tax; a mandatory WHERE clause to ensure you don’t return any deleted rows.

This extra WHERE clause is similar to checking return codes in programming languages that don’t throw exceptions (like C). It’s very simple to do, but if you forget to do it in even one place, bugs can creep in very fast. And it is background noise that detracts away from the real intention of the query.


At first glance you might think evaluating soft delete columns in every query would have a noticeable impact on performance. However, I’ve found that most RDBMSs are actually pretty good at recognizing soft delete columns (probably because they are so commonly used) and does a good job at optimizing queries that use them. In practice, filtering inactive rows doesn’t cost too much in itself.

Instead, the performance hit comes simply from the volume of data that builds up when you don’t bother clearing old rows. For example, we have a table in a system at work that records an organisations day-to-day tasks: pending, planned, and completed. It has around five million rows in total, but of that, only a very small percentage (2%) are still active and interesting to the application. The rest are all historical; rarely used and kept only to maintain foreign key integrity and for reporting purposes.

Interestingly, the biggest problem we have with this table is not slow read performance but writes. Due to its high use, we index the table heavily to improve query performance. But with the number of rows in the table, it takes so long to update these indexes that the application frequently times out waiting for DML commands to finish.

These arguments seem less valid than Ayende's especially when the alternatives proposed are evaluated. Let's look at the aforementioned problems and the proposed alternatives in turn.

Trading the devil you know for the devil you don't: Thoughts on the alternatives to soft deletes

Richard Dingwall argues that soft deletes add unnecessary complexity to the system since all queries have to be aware of the IsDeleted column(s) in the database. As I mentioned in my initial description of soft deletes this definitely does not have to be the case. The database administrator can create views which the core application logic interacts with (i.e. the current_games table in my example) so that only a small subset of system procedures need to actually know that the soft deleted columns even still exist in the database.

A database becoming so large that data manipulation becomes slow due to having to update indexes is a valid problem. However Richard Dingwall's suggested alternative excerpted below seems to trade one problem for a worse one

The memento pattern

Soft delete only supports undoing deletes, but the memento pattern provides a standard means of handling all undo scenarios your application might require.

It works by taking a snapshot of an item just before a change is made, and putting it aside in a separate store, in case a user wants to restore or rollback later. For example, in a job board application, you might have two tables: one transactional for live jobs, and an undo log that stores snapshots of jobs at previous points in time:

The problem I have with this solution is that if your database is already grinding to a halt simply because you track which items are active/inactive in your database, how much worse would the situation be if you now store every state transition in the database as well? Sounds like you're trading one performance problem for a much worse one.

The real problem seems to be that the database has gotten too big to be operated on in an efficient manner on a single machine. The best way to address this is to partition or shard the database. In fact, you could even choose to store all inactive records on one database server and all active records on another. Those interested in database sharding can take a look at a more detailed discussion on database sharding I wrote earlier this year.

Another alternative proposed by both Ayende Rahien and Richard Dingwall is to delete the data but use database triggers to write to an audit log in the cases where auditing is the primary use case for keeping soft deleted entries in the database. This works in the cases where the only reason for soft deleting entries is for auditing purposes. However there are many real world situations where this is not the case.

One use case for soft deleting is to provide an "undo" feature in an end user application. For example, consider a user synchronizes the contact list on their phone with one in the cloud (e.g. an iPhone or Windows Mobile/Windows Phone connecting to Exchange or an Android phone connecting to Gmail). Imagine that the user now deletes a contact from their phone because they do not have a phone number for the person only to find out that person has also been deleted from their address book in the cloud. At that point, an undo feature is desirable.

Other use cases could be the need to reactivate items that have been removed from the database but with their state intact. For example, when people return to Microsoft who used to work there in the past their seniority for certain perks takes into account their previous stints at the company. Similarly, you can imagine a company restocking an item that they had pulled from their shelves because they have become popular due to some new fad (e.g. Beatles memorabilia is back in style thanks to The Beatles™: Rock Band™).

The bottom line is that an audit log may be a useful replacement for soft deletes in some scenarios but it isn't the answer to every situation where soft deletes are typically used.

Not so fast: The argument against hard deletes

So far we haven't discussed how hard deletes should fit in a world of soft deletes. In some cases, soft deletes eventually lead to hard deletes. In the example of video games I've owned I might decide that if a soft deleted item is several years old or is a game from an outdated console then it might be OK to delete. So I'd create a janitor process that would scan the database periodically to seek out soft deleted entries to permanently delete. In other cases, some content may always be hard deleted since there are no situations where one might consider keeping them around for posterity. An example of the latter is comment or trackback spam on a blog post.

Udi Dahan wrote a rebuttal to Ayende Rahien's post where he question my assertion above that there are situations where one wants to hard delete data from the database in his post Don’t Delete – Just Don’t where he writes

Model the task, not the data

Looking back at the story our friend from marketing told us, his intent is to discontinue the product – not to delete it in any technical sense of the word. As such, we probably should provide a more explicit representation of this task in the user interface than just selecting a row in some grid and clicking the ‘delete’ button (and “Are you sure?” isn’t it).

As we broaden our perspective to more parts of the system, we see this same pattern repeating:

Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.

Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.

Jobs aren’t deleted – they’re filled (or their requisition is revoked).

In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.


In all the examples above, what we see is a replacement of the technical action ‘delete’ with a relevant business action. At the entity level, instead of having a (hidden) technical WasDeleted status, we see an explicit business status that users need to be aware of.

I tend to agree with Udi Dahan's recommendation. Instead of a technical flag like IsDeleted, we should model the business process. So my database table of games I owned should really be called games_I_have_owned with the IsDeleted column replaced with something more appropriate such as CurrentlyOwn. This is a much better model of the real-life situation than my initial table and the soft deleted entries are now clearly part of the business process as opposed to being part of some internal system book keeping system.

Advocating that items be never deleted is a tad extreme but I'd actually lean closer to that extreme than most. Unless the data is clearly worthless (e.g. comment spam) or the cost is truly prohibitive (e.g. you're storing large amounts of binary data) then I'd recommend keeping the information around instead of assuming the existence of a DELETE clause in your database is a requirement that you use it.

Note Now Playing: 50 Cent - Baby By Me (feat. Ne-Yo) Note


Categories: Web Development

November 14, 2009
@ 03:03 PM

Joe Hewitt, the developer of the Facebook iPhone application, has an insightful  blog post on the current trend of developers favoring native applications over Web applications on mobile platforms with centrally controlled app stores in his post On Middle Men. He writes

The Internet has been incredibly empowering to creators, and just as destructive to middle men. In the 20th century, every musician needed a record label to get his or her music heard. Every author needed a publishing house to be read. Every journalist needed a newspaper. Anyone who wanted to send a message needed the post office. In the Internet age, the tail no longer wags the dog, and those middle men have become a luxury, not a necessity.

Meanwhile, the software industry is moving in the opposite direction. With the web and desktop operating systems, the only thing in between software developers and users is a mesh of cables and protocols. In the new world of mobile apps, a layer of bureacrats stand in the middle, forcing each developer to queue up for a series of patdowns and metal detectors and strip searches before they can reach their customers.
We're at a critical juncture in the evolution of software. The web is still here and it is still strong. Anyone can still put any information or applications on a web server without asking for permission, and anyone in the world can still access it just by typing a URL. I don't think I appreciated how important that is until recently. Nobody designs new systems like that anymore, or at least few of them succeed. What an incredible stroke of luck the web was, and what a shame it would be to let that freedom slip away.

Am I the only one who thinks the above excerpt would be similarly apt if you replaced the phrase "mobile apps" with "Facebook apps" or "OpenSocial apps"?

Note Now Playing: Lady GaGa - Bad Romance Note


Categories: Web Development

November 7, 2009
@ 04:41 PM

There was an article in the The Register earlier this week titled Twitter fanatic glimpses dark side of OAuth which contains the following excerpt

A mobile enthusiast and professional internet strategist got a glimpse of OAuth's dark side recently when he received an urgent advisory from Twitter.  The dispatch, generated when Terence Eden tried to log in, said his Twitter account may have been compromised and advised he change his password. After making sure the alert was legitimate, he complied.

That should have been the end of it, but it wasn't. It turns out Eden used OAuth to seamlessly pass content between third-party websites and Twitter, and even after he had changed his Twitter password, OAuth continued to allow those websites access to his account.

Eden alternately describes this as a "gaping security hole" and a "usability issue which has strong security implications." Whatever the case, the responsibility seems to lie with Twitter.

If the service is concerned enough to advise a user to change his password, you'd think it would take the added trouble of suggesting he also reset his OAuth credentials, as Google, which on Wednesday said it was opening its own services to work with OAuth, notes here.

I don't think the situation is as cut and dried as the article makes it seem. Someone trying to hack your account by guessing your password and thus triggering a warning that your account is being hacked is completely different from an application you've given permission to access your data doing the wrong thing with it.

Think about it. Below is a list of the applications I've allowed to access my Twitter stream. Is it really the desired user experience that when I change my password on Twitter that all of them break and require that I re-authorize each application?

list of applications that can access my Twitter stream

I suspect Terence Eden is being influenced by the fact that Twitter hasn't always had a delegated authorization model and the way to give applications access to your Twitter stream was to handout your user name & password. That's why just a few months ago it was commonplace to see blog posts like Why you should change your Twitter password NOW! which advocate changing your Twitter password as a way to prevent your password from being stolen from Twitter apps with weak security. There have also been a spate of phishing style Twitter applications such as TwitViewer and TwitterCut which masquerade as legitimate applications but are really password stealers in disguise. Again, the recommendation has been to change your password if you fell victim to such scams.

In a world where you use delegated authorization techniques like OAuth, changing your password isn't necessary to keep out apps like TwitViewer & TwitterCut since you can simply revoke their permission to access your account. Similarly if someone steals my password and I choose a new one, it doesn't mean that I now need to lose Twitter access from Brizzly or the new MSN home page until I reauthorize these apps. Those issues are orthogonal unrelated given the OAuth authorized apps didn't have my password in the first place.

Thus I question the recommendation at the end of the article that it is a good idea to even ask users about de-authorizing applications as part of password reset since it is misleading (i.e. it gives the impression the hack attempt was from one of your authorized apps instead of someone trying to guess your password) and just causes a hassle for the user who now has to reauthorize all the applications at a later date anyway.

Note Now Playing: Rascal Flatts - What Hurts The Most Note


Categories: Web Development