Early this week, Microsoft announced a project code named Velocity. Velocity is a distributed in-memory object caching system in the vein of memcached (aka a Distributed Hash Table).  If you read any modern stories of the architectures of popular Web sites today such as the recently published overview of the LinkedIn social networking site's Web architecture, you will notice a heavy usage of in-memory caching to improve performance. Popular web sites built on the LAMP platform such as Slashdot, Facebook, Digg, Flickr and Wikipedia have all been using memcached to take advantage of in-memory storage to improve performance for years. It is good to see Microsoft step up with a similar technology for Web sites built on the WISC platform.

Like memcached, you can think of Velocity as a giant hash table that can run on multiple servers which automatically handles maintaining the balance of objects hashed to each server and transparently fetches/removes objects from over the network if they aren't on the same machine that is accessing an object in the hash table. In addition, you can add and remove servers from the cluster and the cache automatically rebalances itself.

The Velocity Logical Model

The following diagram taken from the Velocity documentation is helpful in discussing its logical model in detail

Velocity logical model

In the above diagram, each cache host is a server that participates in the cache cluster. Your application can have multiple named caches (e.g. "Inventory", "Product Catalog", etc) each of which can be configured separately. In addition, each named cache can have one or more named region. For example, the Sports region of your Product Catalog or the Arts region of your product catalog. Below is some sample code that shows putting and getting objects in and out of a named cache.

CacheFactory CacheCluster1 = new CacheFactory();
Cache inventoryCache = CacheCluster1.GetCache("Inventory");

Sneaker s = (Sneaker)inventoryCache.Get("NikeAirForce1");
s.price = s.price * 0.8; //20% discount
inventoryCache.Put("NikeAirForce1", s);

Velocity ships with the ability to search for objects by tag but it is limited to objects within a specific region. So you can fetch all objects tagged "Nike" or "sneakers" from the sports region of your product catalog. As shown in the above diagram, a limitation of regions is that all items in a region must be on the same physical server. Below is an example of what the code for interacting with regions looks like

CacheFactory CacheCluster1 = new CacheFactory();
Cache catalog= CacheCluster1.GetCache("Catalog");
List <KeyValuePair<string, object>> sneakers = catalog.GetByTag("Sports", "sneakers");

foreach (var kvp in sneakers)
{
Sneaker s = kvp.Value as Sneaker;
/* do stuff with Sneaker object */
}

The above sample searches for all items tagged "sneakers" from the Sports region of the Catalog cache.

The notion of regions and tagging is one place Velocity diverges from the simpler model of technologies like memcached and provides more functionality.

Eviction Policy and Explicit Object Expiration

Since memory is limited on a server, there has to be an eviction policy that ensures that the cache doesn't end up growing to big thus forcing the operating system to get all Virtual Memory on your ass by writing pages to disk. Once that happens you're in latency hell since fetching objects from the cache will involve going to disk to fetch them. Velocity gives you a couple of knobs that can be dialed up or down as needed to control how eviction or expiration of objects from the cache works. There is a file called ClusterConfig.xml which is used for configuring the eviction and expiration policy of each named cache instance. Below is an excerpt of the configuration file showing the policies for some named cache instances

<!-- Named cache list -->
<
caches>
<
cache name="default" type="partitioned">
<
policy>
<
eviction type="lru" />
<
expiration isExpirable="false" />
</
policy>
</
cache>
<
cache name="Inventory" type="partitioned">
<
policy>
<
eviction type="lru" />
<
expiration isExpirable="true" defaultTTL="50" />
</
policy>
</
cache>
</
caches>

The above excerpt indicates that the default and Inventory caches utilize a Least Recently Used algorithm for determining which objects are evicted from the cache. In addition, it specifies the default interval after which an object can be considered to be stale in the Inventory cache.

The default expiration interval can actually be overridden when putting an object in the cache by specifying a TTL parameter when calling the Put() method.

Concurrency Models: None, Optimistic, or Pessimistic

One of the first things you learn about distributed computing in the real world is that locks are bad mojo. In the way locks traditionally work, an object can be locked by a caller meaning everyone else interested in the object has to wait their turn. Although this prevents errors in your system occurring due to multiple callers interacting with the object at once, it also means there are built-in bottle necks in your system. So lesson #1 of scaling your service is often to get rid of as many locks in your code as possible. Eventually this leads to systems like eBay which doesn't use database transactions and Amazon's Dynamo which doesn't guarantee data consistency.

So what does this have to do with Velocity? Systems designed to scale massively like memcached don't support concurrency. This leads to developers asking questions like this one taken from the memcached mailing list

Consider this situation:-

  • A list with numeric values: 1,2,3
  • Server1: Gets the list from memcache.
  • Server2: Also gets the list from memcache.
  • Server1: Removes '1' from the list.
  • Server2: Removes '3' from the list.
  • Server1: Puts back a list with 2,3 in list in memcache.
  • Server2: Puts back a list with 1,2 in list in memcache.
Note:Since, both servers have their instance of list objs.
This is not what we need to do. Becoz, both servers are putting an incorrect list in memcache.Ideally what shud have happened was that in the end a list with only '1' shud be put back in memcache. This problem occurs under load and happens in case of concurrent threads.
What I want is that memcache shud restrict Server2 and a consistent list shud be there in memcache. How do I handle such problem in memcache environment?? I know we can handle at application server end by doing all these operations through a centralized place(gets and puts), but how do I handle it in Memcache????
  Any help wud be appreciated?

Unfortunately for the author of the question above, memcached doesn't provide APIs for concurrent access and enforcing data consistency (except for numeric counters). So far, the code samples I've shown for Velocity also do not support concurrency.

However there are APIs for fetching or putting objects that support optimistic and pessimistic concurrency models. In the optimistic concurrency model, instead of taking a lock, the objects are given a version number and the caller is expected to specify the version number of the object they have modified when putting it back in the cache. If the object has been modified since the time it was retrieved then there is a version mismatch error. At this point, the caller is expected to re-fetch the object and make their changes to the newly retrieved object before putting it back in the cache. Below is a code sample taken from the Velocity documentation that illustrates what this looks like in code

        /* At time T0, cacheClientA and cacheClientB fetch the same object from the cache */ 

//-- cacheClientA pulls the FM radio inventory from cache
CacheFactory clientACacheFactory = new CacheFactory();
Cache cacheClientA = clientBCacheFactory.GetCache("catalog");
CacheItem radioInventory = cacheClientA.GetCacheItem("electronics", "RadioInventory");


//-- cacheClientB pulls the same FM radio inventory from cache
CacheFactory clientBCacheFactory = new CacheFactory();
Cache cacheClientB = clientBCacheFactory.GetCache("catalog");
CacheItem radioInventory = cacheClientA.GetCacheItem("electronics", "RadioInventory");


//-- At time T1, cacheClientA updates the FM radio inventory
int newRadioInventory = 155;
cacheClientA.Put("electronics", "RadioInventory", newRadioInventory,
radioInventory.Tags, radioInventory.Version);

//-- Later, at time T2, cacheClientB tries to update the FM radio inventory
// AN ERROR OCCURS HERE
int newRadioInventory = 130;
cacheClientB.Put("electronics", "RadioInventory", newRadioInventory,
radioInventory.Tags, radioInventory.Version);

In the pessimistic concurrency model, the caller specifically takes a lock by calling GetAndLock() with a lock time out. The lock is then held until the time out or until the object is put back using PutAndUnlock(). To prevent this from being a performance nightmare, the system does not block requests if a lock is held on an object they want to manipulate. Instead the request is rejected (i.e. it fails).

Update: Some people have commented here and elsewhere that memcached actually does support the optimistic concurrency model using the gets and cas commands. Sorry about that, it wasn't exposed in the memcached libraries I've looked at.

Final Thoughts

From my perspective, this is a welcome addition to the WISC developer's toolkit. I also like that it pushes the expectations of developers on what they should expect from a distributed object cache which I expect will end up being good for the industry overall and not just developers on Microsoft's platforms.

If the above sounds interesting, there is already a technology preview available for download from MSDN here. I've downloaded it but haven't tried to run it yet since I don't have enough machines to test it in the ways I would find interesting. As you can expect there is a Velocity team blog. Subscribed.

Now Playing: 50 Cent - These N*ggas Ain't Hood (feat. Lloyd Banks & Marquis)


 

Categories: Web Development

Matt Asay of C|Net has an article entitled Facebook adopts the CPAL poison pill where he writes

Instead, by choosing CPAL, Facebook has ensured that it can be open source without anyone actually using its source. Was that the intent?

As OStatic explains, CPAL requires display of an attribution notice on derivative works. This practice, which effectively requires downstream code to carry the original developer(s)' logo, came to be known as "badgeware." It was approved by the OSI but continues to be viewed with suspicion within the open-source community.

I've written before about how most open-source licenses don't apply themselves well to the networked economy. Only the OSL, AGPL, and CPAL contemplate web-based services. It's not surprising that Facebook opted for one of these licenses, but I am surprised it chose the one least likely to lead to developers actually modifying the Facebook platform.

If the point was to protect the Facebook platform from competition (i.e., derivative works), Facebook chose a good license. If it was to encourage development, it chose the wrong license.

But if the purpose was to prevent modifications of the platform, why bother open sourcing it at all?

I've seen more than one person repeat the sentiment in the above article which leaves me completely perplexed. With fbOpen Facebook has allowed anyone who is interested to run Facebook applications and participate in what is currently the most popular & vibrant social network widget ecosystem in the world.

I can think of lots of good reasons for not wanting to adopt fbOpen. Maybe the code is in PHP and you are a Ruby On Rails shop. Or maybe it conflicts with your company's grand strategy of painting Facebook as the devil and you the heroes of openness (*cough* Google *cough*). However I can't see how requiring that you mention somewhere on your site that your social network's widget platform is powered by the Facebook developer platform is some sort of onerous POISON PILL which prevents you from using it. In the old days, companies used to charge you for the right to say your application is compatible with theirs, heck, Apple still does. So it seems pretty wacky for someone to call Facebook out for letting people use their code and encouraging them to use the Facebook brand in describing their product. Shoot!

The premise of the entire article is pretty ridiculous, it's like calling the BSD License a poison pill license because of the advertising clause. This isn't to say there aren't real issues with an advertising clause as pointed out in the GNU foundation's article The BSD License Problem. However as far as I'm aware,  adopters of fbOpen don't have to worry about being obligated to display dozens powered by X messages because every bit of code they depend on requires that it is similarly advertised. So that argument is moot in this case.

Crazy article but I've come to expect that from Matt Asay's writing.

Now Playing: Eminem & D12 - Sh*t On You


 

After weeks of preparatory work we are now really close to shipping the alpha version of the next release of RSS Bandit codenamed Phoenix. As you can see from the above screen shot, the key new feature is that you can read feeds from Google Reader, NewsGator Online and the Windows Common Feed List from RSS Bandit in a fully synchronized desktop experience.

This has been really fun code to write and I'm pretty sure I have a pending blog post in me about REST API design based on my experiences using the NewsGator REST API. The primary work items we have are around updating a bunch of the GUI code to realize that there are now multiple feed lists loaded and not just one. I estimate we'll have a version ready for our users to try out on the 14th or 15th of this month.

Your feedback will be appreciated.

Now Playing: Ben Folds - The Luckiest


 

Categories: RSS Bandit

Recently the folks behind Twitter came clean on the architecture behind the service and it is quite clear that the entire service is being held together by chewing gum and baling wire. Only three MySQL database servers for a service that has the I/O requirements of Twitter? Consider how that compares to other Web 2.0 sites that have come clean with their database numbers; Facebook has 1800, Flickr has 166, even Wikipedia has 20. Talk about bringing a knife to a gunfight.

Given the fact that Twitter has had scaling issues for over a year it is surprising that not only has it taken so long for them to figure out that they need a re-architecture but more importantly they decided that having a developer/sys admin manage fail over and traffic spikes by hand was cheaper to the business than buying more hardware and a few weeks of coding. 

A popular social networking that focuses on features instead of performance while upstart competitors are waiting in the wings? Sounds like a familiar song doesn't it? This entire episode reminds me of a story I read in the New York Times a few years ago titled The Wallflower at the Web Party which contains the following familiar sounding excerpts

But the board also lost sight of the task at hand, according to Kent Lindstrom, an early investor in Friendster and one of its first employees. As Friendster became more popular, its overwhelmed Web site became slower. Things would become so bad that a Friendster Web page took as long as 40 seconds to download. Yet, from where Mr. Lindstrom sat, technical difficulties proved too pedestrian for a board of this pedigree. The performance problems would come up, but the board devoted most of its time to talking about potential competitors and new features, such as the possibility of adding Internet phone services, or so-called voice over Internet protocol, or VoIP, to the site.
...
In retrospect, Mr. Lindstrom said, the company needed to devote all of its resources to fixing its technological problems. But such are the appetites of companies fixated on growing into multibillion-dollar behemoths. They seek to run even before they can walk.

“Friendster was so focused on becoming the next Google,” Professor Piskorski said, “that they weren’t focused on fixing the more mundane problems standing in the way of them becoming the next Google.”
...
“We completely failed to execute,” Mr. Doerr said. “Everything boiled down to our inability to improve performance.”

People said about Friendster the same thing they say about Twitter, we travel in tribes - people won't switch to Pownce or Jaiku because all their friends use Twitter. Well Friendster thought the same thing until MySpace showed up and now we have Facebook doing the same to them.

It is a very vulnerable time for Twitter and a savvy competitor could take advantage of that by adding a few features while courting the right set of influential users to jumpstart an exodus. The folks at FriendFeed could be that competitor but I suspect they won't. The Bret & Paul have already boxed their service into being an early adopter's play thing when there's actually interesting mainstream potential for their service. They'd totally put paid to their dreams of being a household brand if they end up simply being a Twitter knock off even if they could end up outplaying Evan and Biz at the game they invented.

Now Playing: Bob Marley - Redemption Song


 

The Live Search team has a blog post entitled Wikipedia Gets Big which reveals

Check it out:

Image of Live Search Wikipedia entry

We realize that often you just need to get a sense of what your query is about. Wikipedia is great for that — you can learn enough from the first paragraph of a Wikipedia article to start you out on the right path.

For Wikipedia results, we now show a good portion of the first paragraph and a few links from the table of contents. You can see more about the topic right there and see what else the article offers.

We hope you learn more, faster with our expanded Wikipedia descriptions. Let us know what you think.

After trying out on a few queries like "rain slick precipice", "wireshark" and "jeremy bentham" I definitely see this as a nice addition to the repertoire of features search engines use to give the right answer directly in the search results page. I've already found this to be an improvement compared to Google's habit of linking to definitions on Answer.com.

The interesting thing to note is just how often Wikipedia actually shows up in the top tier of search results for a diverse set of query terms. If you think this feature has legs why not leave a comment on the Live Search team's blog telling them what you think about it?

Now Playing: Abba - The Winner Takes It All


 

Categories: MSN

I've been having problems with hard drive space for years. For some reason, I couldn't get over the feeling that I had less available space on my hard drive than I could account for. I'd run programs like FolderSizes and after doing some back of the envelope calculations it would seem like I should have gigabytes more free space than what was actually listed as available according to my operating system.

Recently I stumbled on a blog post by Darien Nagle which claimed to answer the question Where's my hard disk space gone? with the recommendation that his readers should try WinDirStat. Seeing nothing to lose I gave it a shot and I definitely came away satisfied. After a quick install, it didn't take long for the application to track down where all those gigabytes of storage I couldn't account for had gone. It seems there was a hidden folder named C:\RECYCLER that was taking up 4 GigaBytes of space.

I thought that was kind of weird so I looked up the folder name and found Microsoft KB 229041 - Files Are Not Deleted From Recycler Folder which listed the following symptoms

SYMPTOMS
When you empty the Recycle Bin in Windows, the files may not be deleted from your hard disk.

NOTE: You cannot view these files using Windows Explorer, My Computer, or the Recycle Bin.

I didn't even have to go through the complicated procedure in the KB article to delete the files, I just deleted them directly from the WinDirStat interface.

My only theory as to how this happened is that some data got orphaned when I upgraded my desktop from Windows XP to Windows 2003 since the user accounts that created them were lost. I guess simply deleting the files from Windows Explorer as I did a few years ago wasn't enough.

Good thing I finally found a solution. I definitely recommend WinDirStat, the visualizations aren't half bad either.

Now Playing: Eminem - Never Enough (feat. 50 Cent & Nate Dogg)


 

Categories: Technology

June 1, 2008
@ 01:46 PM

A coworker forwarded me a story from a Nigerian newspaper about a cat turning into a woman in Port Harcourt, Nigeria. The story is excerpted below

This woman was reported to have earlier been seen as a cat before she reportedly turned into a woman in Port Harcourt, Rivers State, on Thursday. Photo: Bolaji Ogundele. WHAT could be described as a fairy tale turned real on Wednesday in Port Harcourt, Rivers State, as a cat allegedly turned into a middle-aged woman after being hit by a commercial motorcycle (Okada) on Aba/Port Harcourt Expressway.

Nigerian Tribune learnt that three cats were crossing the busy road when the okada ran over one of them which immediately turned into a woman. This strange occurrence quickly attracted people around who descended on the animals. One of them, it was learnt, was able to escape while the third one was beaten to death, still as a cat though.

Another witness, who gave his name as James, said the woman started faking when she saw that many people were gathering around her. “I have never seen anything like this in my life. I saw a woman lying on the road instead of a cat. Blood did not come out of her body at that time. When people gathered and started asking her questions, she pretended that she did not know what had happened," he said.

Reading this reminds me how commonplace it was to read about the kind of mind boggling supernatural stories that you'd expect to see in the Weekly World News in regular newspapers alongside sports, political and stock market news in Nigeria.  Unlike the stories of alien abduction you find in the U.S., the newspaper stories of supernatural events often had witnesses and signed confessions from the alleged perpetrators of supernatural acts. Nobody doubted these stories, everyone knew they were true. Witches who would confess to being behind the run of bad luck of their friends & family or who'd confess that they key to their riches was offering their family members or children as blood sacrifices to ancient gods. It was all stuff I read in the daily papers as a kid as I would be flipping through looking for the comics. 

The current issue of Harper's Bazaar talks about the penis snatching hysteria from my adolescent years. The story is summarized in Slate magazine shown below

Harper's, June 2008
An essay reflects on the widespread reports of "magical penis loss" in Nigeria and Benin, in which sufferers claim their genitals were snatched or shrunken by thieves. Crowds have lynched accused penis thieves in the street. During one 1990 outbreak, "[m]en could be seen in the streets of Lagos holding on to their genitalia either openly or discreetly with their hand in their pockets." Social scientists have yet to identify what causes this mass fear but suspect it is what is referred to as a "culture-bound syndrome," a catchall term for a psychological affliction that affects people within certain ethnic groups.

I remember that time fairly well. I can understand that this sounds like the kind of boogie man stories that fill every culture. In rural America, it is aliens in flying saucers kidnapping people for anal probes and mutilating cows. In Japan, it's the shape changing foxes (Kitsune). In Nigeria, we had witches who snatched penises and could change shape at will.

In the cold light of day it sounds like mass hysteria but I wonder which is easier to believe sometimes. That a bunch of strangers on the street had a mass hallucination that a cat transformed into a woman or that there really are supernatural things beyond modern science's understanding out there? 

Now Playing: Dr. Dre - Natural Born Killaz (feat. Ice Cube)


 

When Google Gears was first announced, it was praised as the final nail in the coffin for desktop applications as it now made it possible to take Web applications offline. However in the past year that Gears has existed, there hasn't been as much progress or enthusiasm in taking applications offline as was initially thought when Gears was announced. There are various reasons for this and since I've already given my thoughts on taking Web applications offline so I won't repeat myself in this post. What I do find interesting is that many proponents of Google Gears including technology evangelists at Google have been gradually switching to pushing Gears as a way to route around browsers and add features to the Web, as opposed to just being about an 'offline solution'. Below are some posts from the past couple of months showing this gradual switch in positioning.

Alex Russell of the Dojo Toolkit wrote a blog post entitled Progress Is N+1 in March of this year that contained the following excerpt

Every browser that we depend on either needs an open development process or it needs to have a public plan for N+1. The goal is to ensure that the market knows that there is momentum and a vehicle+timeline for progress. When that’s not possible or available, it becomes incumbent on us to support alternate schemes to rev the web faster. Google Gears is our best hope for this right now, and at the same time as we’re encouraging browser venders to do the right thing, we should also be championing the small, brave Open Source team that is bringing us a viable Plan B. Every webdev should be building Gear-specific features into their app today, not because Gears is the only way to get something in an app, but because in lieu of a real roadmap from Microsoft, Gears is our best chance at getting great things into the fabric of the web faster. If the IE team doesn’t produce a roadmap, we’ll be staring down a long flush-out cycle to replace it with other browsers. The genius of Gears is that it can augment existing browsers instead of replacing them wholesale. Gears targets the platform/product split and gives us a platform story even when we’re neglected by the browser vendors.

Gears has an open product development process, an auto-upgrade plan, and a plan for N+1.

At this point in the webs evolution, I’m glad to see browser vendors competing and I still feel like that’s our best long-term hope. But we’ve been left at the altar before, and the IE team isn’t giving us lots of reasons to trust them as platform vendors (not as product vendors). For once, we have an open, viable Plan B.

Gears is real, bankable progress.

This was followed up by a post by Dion Almaer who's a technical evangelist at Google who wrote the following in his post Gears as a bleeding-edge HTML 5 implementation

I do not see HTML 5 as competition for Gears at all. I am sitting a few feet away from Hixie’s desk as I write this, and he and the Gears team have a good communication loop.

There is a lot in common between Gears and HTML 5. Both are moving the Web forward, something that we really need to accelerate. Both have APIs to make the Web do new tricks. However HTML 5 is a specification, and Gears is an implementation.
...
Standards bodies are not a place to innovate, else you end up with EJB and the like.
...
Gears is a battle hardened Web update mechanism, that is open source and ready for anyone to join and take in interesting directions.

and what do Web developers actually think about using Google's technology as a way to "upgrade the Web" instead of relying on Web browsers and standards bodies for the next generation of features for the Web? Here's one answer from Matt Mullenweg, founder of WordPress, taken from his post Infrastructure as Competitive Advantage 

When a website “pops” it probably has very little to do with their underlying server infrastructure and a lot to do with the perceived performance largely driven by how it’s coded at the HTML, CSS, and Javascript level. This, incidentally, is one of the reasons Google Gears is going to change the web as we know it today - LocalServer will obsolete CDNs as we know them. (Look for this in WordPress soonish.)

That's a rather bold claim (pun intended) by Matt. If you're wondering what features Matt is adding to WordPress that will depend on Gears, they were recently discussed in Dion Almaer's post Speed Up! with Wordpress and Gears which is excerpted below

WordPress 2.6 and Google Gears

However, Gears is so much more than offline, and it is really exciting to see “Speed Up!” as a link instead of “Go Offline?”

This is just the beginning. As the Gears community fills in the gaps in the Web development model and begins to bring you HTML5 functionality I expect to see less “Go Offline” and more “Speed Up!” and other such phrases. In fact, I will be most excited when I don’t see any such linkage, and the applications are just better.

With an embedded database, local server storage, worker pool execution, desktop APIs, and other exciting modules such as notifications, resumable HTTP being talked about in the community…. I think we can all get excited.

Remember all those rumors back in the day that Google was working on their own browser? Well they've gone one better and are working on the next Flash. Adobe likes pointing out that Flash has more market share than any single browser and we all know that has Flash has gone above and beyond the [X]HTML standards bodies to extend the Web thus powering popular, rich user experiences that weren't possible otherwise (e.g. YouTube). Google is on the road to doing the same thing with Gears. And just like social networks and content sharing sites were a big part in making Flash an integral part of the Web experience for a majority of Web users, Google is learning from history with Gears as can be seen by the the recent announcements from MySpace. I expect we'll soon see Google leverage the popularity of YouTube as another vector to spread Google Gears.  

So far none of the Web sites promoting Google Gears have required it which will limit its uptake. Flash got ahead by being necessary for sites to even work. It will be interesting to see if or when sites move beyond using Gears for nice-to-have features and start requiring it to function. It sounds crazy but I never would have expected to see sites that would be completely broken if Flash wasn't installed five years ago but it isn't surprising today (e.g. YouTube).

PS: For those who are not familiar with the technical details of Google Gears, it currently provides three main bits of functionality; thread pools for asynchronous operations, access to a SQL database running on the user's computer, and access to the user's file system for storing documents, images and other media. There are also beta APIs which provide more access to the user's computer from the browser such as the Desktop API which allows applications to create shortcuts on the user's desktop.  

Now Playing: Nas - It Ain't Hard To Tell


 

Categories: Platforms | Web Development

I started thinking about the problems inherent in social news sites recently due to a roundabout set of circumstances. Jeff Atwood wrote a blog post entitled It's Clay Shirky's Internet, We Just Live In It which linked to a post he made in 2005 titled A Group Is Its Own Worst Enemy which linked to my post on the issues I'd seen with kuro5hin, an early social news site which attempted to "fix" the problems with Slashdot's model but lost its technology focus along the way.

A key fact about online [and offline] communities is that the people who participate the most eventually influence the direction and culture of the community. Kuro5hin tried to fix two problems with Slashdot, the lack of a democratic voting system and the focus on mindless link propagation instead of deeper, more analytical articles. I mentioned how this experiment ended up in my post Jeff linked to which is excerpted below

Now five years later, I still read Slashdot every day but only check K5 out every couple of months out of morbid curiosity. The democracy of K5 caused two things to happen that tended to drive away the original audience. The first was that the focus of the site ended up not being about technology mainly because it is harder for people to write technology articles than write about everyday topics that are nearer and dearer to their hearts. Another was that there was a steady influx of malicious users who eventually drove away a significant proportion of K5's original community, many of whom migrated to HuSi.  This issue is lamented all the time on K5 in comments such as an exercise for rusty and the editors. and You don't understand the nature of what happened.

Besides the malicious users one of the other interesting problems we had on K5 was that the number of people who actually did things like rate comments was very small relative to the number of users on the site. Anytime proposals came up for ways to fix these issues, there would often be someone who disregarded the idea by stating that we were "seeking a technical solution to a social problem". This interaction between technology and social behavior was the first time I really thought about social software. 

The common theme underscoring both problems that hit the site is that they are all due to the cost of participation. It is easier to participate if you are writing about politics during an election year than if you have to write some technical article about the feasibility of adding garbage collection to C++ or analysis of distributed computing technologies. So users followed the path of least resistance. Similarly, cliques of malicious users and trolls have lots of time on their hands by definition and Kuro5hin never found a good way to blunt their influence. Slashdot's system of strong editorial control and meta-moderation of comment ratings actually turned out to be strengths compared to kuro5hin's more democratic and libertarian approach.

This line of thinking leads me to Giles Bowkett very interesting thoughts about social news sites like Slashdot, Digg and Reddit in his post Summon Monsters? Open Door? Heal? Or Die? where he wrote

A funny thing about these sites is that they know about this problem. Hacker News is very concerned about not turning into the next Reddit; Reddit was created as a better Digg; and Digg's corporate mission statement is "at least we're not Slashdot." None of them seem to realize that the order from least to most horrible is identical to the order from youngest to oldest, or that every one of them was good once and isn't any longer.
...
When you build a system where you get points for the number of people who agree with you, you are building a popularity contest for ideas. However, your popularity contest for ideas will not be dominated by the people with the best ideas, but the people with the most time to spend on your web site. Votes appear to be free, like contribution is with Wikipedia, but in reality you have to register to vote, and you have to be there frequently for your votes to make much difference. So the votes aren't really free - they cost time. If you do the math, it's actually quite obvious that if your popularity contest for ideas inherently, by its structure, favors people who waste their own time, then your contest will produce winners which are actually losers. The most popular ideas will not be the best ideas, since the people who have the best ideas, and the ability to recognize them, also have better things to do and better places to be.

Even if you didn't know about the long tail, you'd look for the best ideas on Hacker News (for example) not in its top 10 but in its bottom 1000, because any reasonable person would expect this effect - that people who waste their own time have, in effect, more votes than people who value it - to elevate bad but popular ideas and irretrievably sink independent thinking. And you would be right. TechCrunch is frequently in HN's top ten.

I agree with everything excerpted above except for the implication that all of these sites want to be "better" than their predecessors. I believe that Digg simply wants to be more popular (i.e. garner more page views) than its predecessors and competitors.  If the goal of a site is to generate page views then a there is nothing wrong with a popularity contest. However the most popular ideas are hardly ever the best ideas, they are often simply the most palatable to the target audience.

As a user, being popular in such online communities requires two things; being prolific and knowing your audience. If you know your audience, it isn't hard to always generate ideas that will be popular with them. And once you start generating content on a regular basis, you eventually become an authority. This is what happened with MrBabyMan of Digg (and all the other Top Diggers) who has submitted thousands of articles to the site and voted on tens of thousands of articles.  This is also what happened with Signal 11 of Slashdot almost a decade ago (damn, I'm getting old). In both the case of MrBabyMan (plus other Top Diggers) and Signal 11, some segment of the user base eventually cottoned on to the fact that participation in a social news site is a game and rallied against the users who are "winning" the game. Similarly in both cases, the managers of the community tried to blunt the rewards of being a high scorer - in Slashdot's case it was with the institution of the karma cap while Digg did it by getting rid of the list of top Diggers.

Although turning participation in your online community into a game complete with points and a high score table is a good tactic to gain an initial set of active users, it does not lead to a healthy or diverse community in the long run. Digg and Slashdot both eventually learned this and have attempted to fix it in their own ways.

Social news sites like Reddit & Digg also have to contend with the fact that the broader their audience gets the less controversial and original their content will be since the goal of such sites is to publish the most broadly popular content on the front page. Additionally, ideas that foster group think will gain in popularity as the culture and audience of the site congeals. Once that occurs, two things will often happen to the site (i) growth will flatten out since there is now a set audience and culture for the site and (ii) the original crop of active users will long for the old days and gripe a lot about how things have changed. This has happened to Slashdot, Kuro5hin, Reddit and every other online community I've watched over time.

This is cycle and fundamental flaw of social news sites will always happen because A Group Is Its Own Worst Enemy.

Now Playing: Notorious B.I.G. - You're Nobody (Til Somebody Kills You)


 

Categories: Social Software

A few months ago Michael Mace, former Chief Competitive Officer and VP of Product Planning at Palm, wrote an insightful and perceptive eulogy for mobile application platforms entitled Mobile applications, RIP where he wrote

Back in 1999 when I joined Palm, it seemed we had the whole mobile ecosystem nailed. The market was literally exploding, with the installed base of devices doubling every year, and an incredible range of creative and useful software popping up all over. In a 22-month period, the number of registered Palm developers increased from 3,000 to over 130,000. The PalmSource conference was swamped, with people spilling out into the halls, and David Pogue took center stage at the close of the conference to tell us how brilliant we all were.

It felt like we were at the leading edge of a revolution, but in hindsight it was more like the high water mark of a flash flood.
...

Two problems have caused a decline the mobile apps business over the last few years. First, the business has become tougher technologically. Second, marketing and sales have also become harder.

From the technical perspective, there are a couple of big issues. One is the proliferation of operating systems. Back in the late 1990s there were two platforms we had to worry about, Pocket PC and Palm OS. Symbian was there too, but it was in Europe and few people here were paying attention. Now there are at least ten platforms. Microsoft alone has several -- two versions of Windows Mobile, Tablet PC, and so on. [Elia didn't mention it, but the fragmentation of Java makes this situation even worse.]

I call it three million platforms with a hundred users each (link).  

...
In the mobile world, what have we done? We created a series of elegant technology platforms optimized just for mobile computing. We figured out how to extend battery life, start up the system instantly, conserve precious wireless bandwidth, synchronize to computers all over the planet, and optimize the display of data on a tiny screen.

But we never figured out how to help developers make money. In fact, we paired our elegant platforms with a developer business model so deeply broken that it would take many years, and enormous political battles throughout the industry, to fix it -- if it can ever be fixed at all.

So what does this have to do with social networking sites? The first excerpt from the post where it talks about 130,000 registered developers for the Palm OS sounds a lot like the original hype around the Facebook platform with headlines screaming Facebook Platform Attracts 1000 Developers a Day.

The second excerpt talks about the time there became two big mobile platforms, analogous to the appearance of Google's OpenSocial on the scene as a competing platform used by a consortium of Facebook's competitors. This means that widget developers like Slide and RockYou have to target one set of APIs when building widgets for MySpace, LinkedIn, & Orkut and another completely different set of APIs when building widgets for Facebook & Bebo. Things will likely only get worse. One reason for this is that despite API standardization, all of these sites do not have the same features. Facebook has a Web-based IM, Bebo does not. Orkut has video uploads, LinkedIn does not. All of these differences eventually creep into such APIs as "vendor extensions". The fact that both Google and Facebook are also shipping Open Source implementations of their platforms (Shindig and fbOpen) makes it even more likely that the social networking sites will find ways to extend these platforms to suit their needs.

Finally, there's the show-me-the-money problem. It still isn't clear how one makes money out of building on these social networking platforms. Although companies like Photobucket and Slide have gotten a quarter of a billion and half a billion dollar valuations these have all been typical "Web 2.0" zero profit valuations. This implies that platform developers don't really make money but instead are simply trying to gather a lot of eyeballs then flip to some big company with lots of cash and no ideas. Basically it's a VC funded lottery system. This doesn't sound like the basis of successful and long lived platform such as what we've seen with Microsoft's Windows and Office platforms or Google's Search and Adwords ecosystem. In these platforms there are actually ways for companies to make money by adding value to the ecosystem, this seems more sustainable in the long run than what we have today in the various social networking widget platforms.

It will be interesting to see if the history repeats itself in this instance.

Now Playing: Rick Ross - Luxury Tax (featuring Lil Wayne, Young Jeezy and Trick Daddy)


 

Categories: Platforms | Social Software