Recently I’ve been bumping into more and more people who’ve either left Google to come to Microsoft or got offers from both companies and picked Microsoft over Google. I believe this is part of a larger trend especially since I’ve seen lots of people who left the company for “greener pastures” return in the past year (at least 8 people I know personally have rejoined) . However in this blog post I’ll stick to talking about people who’ve chosen Microsoft over Google.

First of all there’s the post by Sergey Solyanik entitled Back to Microsoft where he primarily gripes about the culture and lack of career development at Google, some key excerpts are

Last week I left Google to go back to Microsoft, where I started this Monday (and so not surprisingly, I was too busy to blog about it)

So why did I leave?

There are many things about Google that are not great, and merit improvement. There are plenty of silly politics, underperformance, inefficiencies and ineffectiveness, and things that are plain stupid. I will not write about these things here because they are immaterial. I did not leave because of them. No company has achieved the status of the perfect workplace, and no one ever will.

I left because Microsoft turned out to be the right place for me.

Google software business is divided between producing the "eye candy" - web properties that are designed to amuse and attract people - and the infrastructure required to support them. Some of the web properties are useful (some extremely useful - search), but most of them primarily help people waste time online (blogger, youtube, orkut, etc)

This orientation towards cool, but not necessarilly useful or essential software really affects the way the software engineering is done. Everything is pretty much run by the engineering - PMs and testers are conspicuously absent from the process. While they do exist in theory, there are too few of them to matter.

On one hand, there are beneficial effects - it is easy to ship software quickly…On the other hand, I was using Google software - a lot of it - in the last year, and slick as it is, there's just too much of it that is regularly broken. It seems like every week 10% of all the features are broken in one or the other browser. And it's a different 10% every week - the old bugs are getting fixed, the new ones introduced. This across Blogger, Gmail, Google Docs, Maps, and more

The culture part is very important here - you can spend more time fixing bugs, you can introduce processes to improve things, but it is very, very hard to change the culture. And the culture at Google values "coolness" tremendously, and the quality of service not as much. At least in the places where I worked.

The second reason I left Google was because I realized that I am not excited by the individual contributor role any more, and I don't want to become a manager at Google.

The Google Manager is a very interesting phenomenon. On one hand, they usually have a LOT of people from different businesses reporting to them, and are perennially very busy.

On the other hand, in my year at Google, I could not figure out what was it they were doing. The better manager that I had collected feedback from my peers and gave it to me. There was no other (observable by me) impact on Google. The worse manager that I had did not do even that, so for me as a manager he was a complete no-op. I asked quite a few other engineers from senior to senior staff levels that had spent far more time at Google than I, and they didn't know either. I am not making this up!

Sergey isn’t the only senior engineer I know who  has contributed significantly to Google projects and then decided Microsoft was a better fit for him. Danny Thorpe who worked on Google Gears is back at Microsoft for his second stint working on developer technologies related to Windows Live.  These aren’t the only folks I’ve seen who’ve decided to make the switch from the big G to the b0rg, these are just the ones who have blogs that I can point at.

Unsurprisingly, the fact that Google isn’t a good place for senior developers is also becoming clearly evident in their interview processes. Take this post from Svetlin Nakov entitled Rejected a Program Manager Position at Microsoft Dublin - My Successful Interview at Microsoft where he concludes

My Experience at Interviews with Microsoft and Google

Few months ago I was interviewed for a software engineer in Google Zurich. If I need to compare Microsoft and Google, I should tell it in short: Google sux! Here are my reasons for this:

1) Google interview were not professional. It was like Olympiad in Informatics. Google asked me only about algorithms and data structures, nothing about software technologies and software engineering. It was obvious that they do not care that I had 12 years software engineering experience. They just ignored this. The only think Google wants to know about their candidates are their algorithms and analytical thinking skills. Nothing about technology, nothing about engineering.

2) Google employ everybody as junior developer, ignoring the existing experience. It is nice to work in Google if it is your first job, really nice, but if you have 12 years of experience with lots of languages, technologies and platforms, at lots of senior positions, you should expect higher position in Google, right?

3) Microsoft have really good interview process. People working in Microsoft are relly very smart and skillful. Their process is far ahead of Google. Their quality of development is far ahead of Google. Their management is ahead of Google and their recruitment is ahead of Google.

Microsoft is Better Place to Work than Google

At my interviews I was asking my interviewers in both Microsoft and Google a lot about the development process, engineering and technologies. I was asking also my colleagues working in these companies. I found for myself that Microsoft is better organized, managed and structured. Microsoft do software development in more professional way than Google. Their engineers are better. Their development process is better. Their products are better. Their technologies are better. Their interviews are better. Google was like a kindergarden - young and not experienced enough people, an office full of fun and entertainment, interviews typical for junior people and lack of traditions in development of high quality software products.

Based on my observations, I have theory that Google’s big problem is that the company hasn’t realized that it isn’t a startup anymore. This disconnect between the company’s status and it’s perception of itself manifests in a number of ways

  1. Startups don’t have a career path for their employees. Does anyone at Facebook know what they want to be in five years besides rich? However once riches are no longer guaranteed and the stock isn’t firing on all cylinders (GOOG is underperforming both the NASDAQ and DOW Jones industrial average this year) then you need to have a better career plan for your employees that goes beyond “free lunches and all the foosball you can handle".

  2. There is no legacy code at a startup. When your code base is young, it isn’t a big deal to have developers checking in new features after an overnight coding fit powered by caffeine and pizza. For the most part, the code base shouldn’t be large enough or interdependent enough for one change to cause issues. However it is practically a law of software development that the older your code gets the more lines of code it accumulates and the more closely coupled your modules become. This means changing things in one part of the code can have adverse effects in another. 

    As all organizations mature they tend to add PROCESS. These processes exist to insulate the companies from the mistakes that occur after a company gets to a certain size and can no longer trust its employees to always do the right thing. Requiring code reviews, design specifications, black box & whitebox & unit testing, usability studies, threat models, etc are all the kinds of overhead that differentiate a mature software development shop from a “fly by the seat of your pants” startup. However once you’ve been through enough fire drills, some of those processes don’t sound as bad as they once did. This is why senior developers value them while junior developers don’t since the latter haven’t been around the block enough.

  3. There is less politics at a startup. In any activity where humans have to come together collaboratively to achieve a goal, there will always be people with different agendas. The more people you add to the mix, the more agendas you have to contend with. Doing things by consensus is OK when you have to get consensus from two or three people who sit in the same hallway as you. It’s a totally different ball game when you need to gain it from lots of people from across a diverse company working on different projects in different regions of the world who have different perspectives on how to solve your problems. At Google, even hiring an undergraduate candidate has to go through several layers of committees which means hiring managers need to possess some political savvy if they want to get their candidates approved.  The founders of Dodgeball quit the Google after their startup was acquired after they realized that they didn’t have the political savvy to get resources allocated to their project.

The fact that Google is having problems retaining employees isn't news, Fortune wrote an article about it just a few months ago. The technology press makes it seem like people are ditching Google for hot startups like FriendFeed and Facebook. However the truth is more nuanced than that. Now that Google is just another big software company, lots of people are comparing it to other big software companies like Microsoft and finding it lacking.

Now Playing: Queen - Under Pressure (feat. David Bowie)


 

Categories: Life in the B0rg Cube

Last week TechCrunch UK wrote about a search startup that utilizes AI/Semantic Web techniques named True Knowledge. The post entitled VCs price True Knowledge at £20m pre-money. Is this the UK’s Powerset?  stated

The chatter I’m hearing is that True Knowledge is being talked about in hushed tones, as if it might be the Powerset of the UK. To put that in context, Google has tried to buy the Silicon Valley search startup several times, and they have only launched a showcase product, not even a real one. However, although True Knowledge and Powerset are similar, they are different in significant ways, more of which later.
...
Currently in private beta, True Knowledge says their product is capable of intelligently answering - in plain English - questions posed on any topic. Ask it if Ben Affleck is married and it will come back with "Yes" rather than lots of web pages which may or may not have the answer (don’t ask me!).
...
Here’s why the difference matters. True Knowledge can infer answers that the system hasn’t seen. Inferences are created by combining different bits of data together. So for instance, without knowing the answer it can work out how tall the Eiffel Tower is by inferring that it is shorter that the Empire State Building but higher than St Pauls Cathedral.
...
AI software developer and entrepreneur William Tunstall-Pedoe is the founder of True Knowledge. He previously developed a technology that can solve a commercially published crossword clues but also explain how the clues work in plain English. See the connection?

The scenarios described in the TechCrunch write up should sound familiar to anyone who has spent any time around fans of the Semantic Web. Creating intelligent agents that can interrogate structured data on the Web and infer new knowledge has turned out to  be easier said than done because for the most part content on the Web isn't organized according to the structure of the data. This is primarily due to the fact that HTML is a presentational language. Of course, even if information on the Web was structured data (i.e. idiomatic XML formats) we still need to build machinary to translate between all of these XML formats.

Finally, in the few areas on the Web where structured data in XML formats is commonplace such as Atom/RSS feeds for blog content, not a lot has been done with this data to fulfill the promise of the Semantic Web.

So if the Semantic Web is such an infeasible utopia, why are more and more search startups using that as the angle from which they will attack Google's dominance of Web search? The answer can be found in Bill Slawski's post from a year ago entitled Finding Customers Through Anti-Commercial Queries where he wrote

Most Queries are Noncommercial

The first step might be to recognize that most queries conducted by people at search engines aren't aimed at buying something. A paper from the WWW 2007 held this spring in Banff, Alberta, Canada, Determining the User Intent of Web Search Engine Queries, provided a breakdown of the types of queries that they were able to classify.

Their research uncovered the following numbers: "80% of Web queries are informational in nature, with about 10% each being navigational and transactional." The research points to the vast majority of searches being conducted for information gathering purposes. One of the indications of "information" queries that they looked for were searches which include terms such as: “ways to,” “how to,” “what is.”

Although the bulk of the revenue search engines make is from people performing commercial queries such as searching for "incredible hulk merchandise", "car insurance quotes" or "ipod prices", this is actually a tiny proportion of the kinds of queries people want answered by search engines. The majority of searches are about the five Ws (and one H) namely "who", "what", "where", "when", "why" and "how". Such queries don't really need a list of Web pages as results, they simply require an answer. The search engine that can figure out how to always answer user queries directly on the page without making the user click on half a dozen pages to figure out the answer will definitely have moved the needle when it comes to the Web search user experience.

This explains why scenarios that one usually associates with AI and Semantic Web evangelists are now being touted by the new generation of "Google-killers". The question is whether knowledge inference techniques will prove to be more effective than traditional search engine techniques when it comes to providing the best search results especially since a lot of the traditional search engines are learning new tricks.

Now Playing: Bob Marley - Waiting In Vain


 

Categories: Technology

 At the end of February of this year, I wrote a post entitled No Contest: FriendFeed vs. The Facebook News Feed where I argued that it would be a two month project for an enterprising developer at Facebook to incorporate all of the relevant features of FriendFeed that certain vocal bloggers had found so enticing. Since then we've had two announcements from Facebook

From A new way to share with friends on April 15th

we've introduced a way for you to import activity from other sites into your Mini-Feed (and into your friends' News Feeds).

The option to import stories from other sites can be found via the small "Import" link at the top of your Mini-Feed. Only a few sites—Flickr, Yelp, Picasa, and del.icio.us—are available for importing at the moment, but we'll be adding Digg and other sites in the near future. These stories will look just like any other Mini-Feed stories, and will hopefully increase your ability to share information with the people you care about.

From on We're Open For Commentary on June 25th (Yesterday)

In the past, you've been able to comment on photos, notes and posted items, but if there was something else on your friend's profile—an interesting status, or a cool new friendship—you'd need to send a message or write a Wall post to talk about it. But starting today, you can comment on your friends' Mini-Feed stories right from their profile.

Now you can easily converse around friends' statuses, application stories, new friendships, videos, and most other stories you see on their profile. Just click on the comment bubble icon to write a comment or see comments other people have written.

It took a little longer than two months but it looks like I was right. For some reason Facebook isn't putting the comment bubbles in the news feed but I assume that is only temporary and they are trying it out in the mini-feed first.

FriendFeed has always seemed to me to be a weird concept for a stand alone application. Why would I want to go to whole new site and create yet another friend list just to share what I'm doing on the Web with my friends? Isn't that what social networking sites are for? It just sounds so inconvenient, like carrying around a pager instead of a mobile phone.

As I said in my original post on the topic, all FriendFeed has going for it is the community that has built around the site. Especially since the functionality it provides can be easily duplicated and actually fits better as a feature of an existing social networking site. The question is whether that community is the kind that will grow into making it a mainstream success or whether it will remain primarily a playground for Web geeks despite all the hype (see del.icio.us as an example of this). So far, the chance of the latter seems strong. For comparison, consider the growth curve of Twitter against that of FriendFeed on Google Trends and Alexa.  Which seems more likely to one day have the brand awareness of a Flickr or a Facebook?

Now Playing: Bob Marley - I Shot The Sheriff


 

Categories: Social Software

Jason Kincaid over at TechCrunch has a blog post entitled Microsoft’s First Step In Accepting OpenID SignOns - HealthVault where he writes

Over 16 months after first declaring its support for the OpenID authentication platform, Microsoft has finally implemented it for the first time, allowing for OpenID logins on its Health Vault medical site. Unfortunately, Health Vault will only support authentication from two OpenID providers: Trustbearer and Verisign. Whatever happened to the Open in OpenID?

The rationale behind the limited introduction is that health is sensitive, so access should be limited to the few, most trusted OpenID providers. It certainly makes sense, but it also serves to underscore one of the problems inherent to OpenID: security
...
But it seems that the platform itself may be even more deserving of scrutiny. What good is a unified login when its default form will only be accepted on the least private and secure sites?

A while back I mentioned that the rush to slap "Open" in front of every new spec written by a posse of Web companies had created a world where "Open" had devolved into a PR marketing term with no real meaning since the term was being used too broadly to define different sorts of "openness".  In the above case, the "open" in OpenID has never meant that every service that accepts OpenIDs needs to accept them from every OpenID provider.

Simon Willison, who's been a key evangelist of OpenID, has penned an insightful response to Jason Kincaid's article in his post The point of “Open” in OpenID which is excerpted below

TechCrunch report that Microsoft are accepting OpenID for their new HealthVault site, but with a catch: you can only use OpenIDs from two providers: Trustbearer (who offer two-factor authentication using a hardware token) and Verisign. "Whatever happened to the Open in OpenID?", asks TechCrunch’s Jason Kincaid.

Microsoft’s decision is a beautiful example of the Open in action, and I fully support it.

You have to remember that behind the excitement and marketing OpenID is a protocol, just like SMTP or HTTP. All OpenID actually provides is a mechanism for asserting ownership over a URL and then “proving” that assertion. We can build a pyramid of interesting things on top of this, but that assertion is really all OpenID gives us (well, that and a globally unique identifier). In internet theory terms, it’s a dumb network: the protocol just concentrates on passing assertions around; it’s up to the endpoints to set policies and invent interesting applications.
...
HealthVault have clearly made this decision due to security concerns—not over the OpenID protocol itself, but the providers that their users might choose to trust. By accepting OpenID on your site you are outsourcing the security of your users to an unknown third party, and you can’t guarantee that your users picked a good home for their OpenID. If you’re a bank or a healthcare provider that’s not a risk you want to take; whitelisting providers that you have audited for security means you don’t have to rule out OpenID entirely.

The expectation that services would have to create a white list of OpenID providers is not new thinking. Tim Bray blogged as much in his post on OpenID over year ago where he speculated that there would eventually be a market for rating OpenID providers so companies wouldn't have to individually audit each OpenID provider before deciding which ones to add to their white list.  

As more companies decide to accept OpenID as a login mechanism on their services, I suspect that either the community or some company will jump in to fill the niche that Tim Bray speculated about in his post. I can't imagine that it is fun having to audit all of the OpenID providers as part of deciding how people will login to your site nor does it make sense that everyone who plans to support OpenID security audits the same list of services. That sounds like a ton of wasted man hours when it can just be done once then the results shared by all.

Now Playing: Big & Rich - Save a Horse (Ride a Cowboy)


 

Categories: Web Development

Over the past few months there have been a number of posts about how aggregators like FriendFeed are causing bloggers to "lose control of the conversation". Louis Gray captured some of the blogger angst about this topic in his Should Fractured Feed Reader Comments Raise Blog Owners' Ire? where he wrote

While the discussion around where a blog's comments should reside has raised its head before, especially around services like FriendFeed, (See: Sarah Perez of Read Write Web: Blog Comments Still Matter) it flared up again this afternoon when I had (innocently, I thought) highlighted how one friend's blog post from earlier in the week was getting a lot of comments, and had become the most popular story on Shyftr, a next-generation RSS feed reader that enables comments within its service.

While I had hoped the author (Eric Berlin of Online Media Cultist, who I highlighted on Monday and like quite a bit) would be pleased to see his post had gained traction, the reaction was not what I had expected. He said he was uneasy about seeing his posts generate activity and community for somebody else. Another FriendFeed user called it "content theft" and said "if they ever pull my feed and use it there, they can expect to get hit with a DMCA take-down notice". (See the discussion here)

Surprisingly [at least to me] these aren't the only instances where people have become upset because there are more comments happening in Friendfeed than on their post. Colin Walker tells the the story of Rob La Gesse who signed up for FriendFeed only to cancel his account because his "friends" on the site preferred commenting on FriendFeed than on his blog. 

I suspect that a lot of the people expressing outrage are new to blogging which is why they expect that their blog comments are the be all and end all of conversation about their blog posts. This has never been the case. For one, blogs have had to contend with social news sites like Slashdot, Digg and reddit where users can submit stories and then comment on them. A post may have a handful of comments on the original blog but generate dozens or hundreds of responses on a social news site. For example, I recently wrote about functional programming C# 3.0 and while there were less than 10 comments on my blog there were over 150 comments in the discussion of the post on reddit.

Besides social news sites, there are other bloggers to consider. People with their own blogs often prefer blogging a response to your post instead of leaving a comment on the original post. This is the reason services like Technorati and technologies like Trackback were invented. Am I "stealing the conversation" from Louis Gray's post by writing this blog post in response to his instead of leaving a comment?

Then there's email, IM and other forms of active sharing. I've lost count of the amount of times that people have told me that one of my blog posts was circulated around their group and a lively conversation ensued. Quite often, the referenced post has no comments.

In short, bloggers aren't losing control of the conversation due to services like FriendFeed because they never had it in the first place. You can't lose what you don't have.

When it comes to FriendFeed there are two things I like about the fact that they enable comments on items. The first is that it is good for their users since it provides a place to chat about content they find on the Web without having to send out email noise (i.e. starting conversations via passive instead of active sharing). The second is that it is good for FriendFeed because it builds network effects and social lock-in into their product. Sure, anyone can aggregate RSS feeds from Flickr/del.icio.us/YouTube/etc  (see SocialThing, Facebook Import, Grazr, etc) but not everyone has the community that has been built around the conversations on FriendFeed. 

Now Playing: Lloyd Banks - Born Alone, Die Alone


 

Categories: Social Software

After talking about it for months, we finally have an alpha version of the next release of RSS Bandit codenamed Phoenix. There are two key new features in this release. The first is that we've finally finished off the last of the features related to downloading podcasts. If you go to View->Download Manager you can now view and manage pending downloads of podcasts/enclosures as shown in the screen shot below.

The second feature is one I'm sure will be appreciated by peoples who like reading their feeds from multiple computers but still want a desktop-based feed reader for a variety of reasons (e.g. reading feeds from a corporate intranet while roaming your feeds from the public Web). With this feature you can have multiple feed lists which are synchronized from a Web-based feed reader such as Google Reader or NewsGator Online while still keeping some feeds local. All you need to do is go to File->Syncronize Feeds and follow the steps as shown in the screen shots below

after selecting the option to synchronize feeds you are taken to a wizard which gives some options of feed sources to synchronize with and obtains your user credentials if necessary.

once you have given the wizard your information your feed list is synchronized and every action you make in RSS Bandit such as subscribing to new feeds, unsubscribing from existing feeds, renaming feeds or marking items as read is reflected in your Web-based feed reader of choice. The experience is intended to mirror the experience of using a desktop mail client in concert with a Web-based email service and it should work as expected.

in addition you can also use Google Reader's sharing feature (or NewsGator's clipping feature) directly from RSS Bandit as shown below

I'm sure you're wondering where you can download this version of RSS Bandit and try it out for yourself. Get it here. There are two files in the installer package, I suggest running setup.exe because that validates that you have the correct prerequisites to run the application and tells you where to get them otherwise.

Please note that this is alpha quality software so although it is intended to be fully functional (except for searching within your subscriptions) you should expect it to be buggy. If you have any problems feel free to file a bug on SourceForge or ask a question on our forum

PS: If you are an existing RSS Bandit user I'd suggest backing up your application data folder just in case. On a positive note, we've fixed dozens of bugs from previous versions.

Now Playing: Young Buck - Puff Puff Pass (ft. Ky-Mani Marley)


 

Categories: RSS Bandit

When adding new features that dramatically change how users interact with your site, it is a good practice to determine up front if your service can handle these new kinds of interactions so you don't end up constantly disabling features due to the high load they incur on your site. 

A couple of weeks ago, the developers at Facebook posted an entry about the architecture of Facebook Chat (it's written in Erlang, OMFG!!!) and I was interested to see the discussion of how they tested the scalability of the feature to ensure they didn't create negative first impressions when they rolled it due to scale issues or impact the availability of their main site. The relevant part of the post is excerpted below

Ramping up:

The secret for going from zero to seventy million users overnight is to avoid doing it all in one fell swoop. We chose to simulate the impact of many real users hitting many machines by means of a "dark launch" period in which Facebook pages would make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page. With the "dark launch" bugs fixed, we hope that you enjoy Facebook Chat now that the UI lights have been turned on.

The approach followed by Facebook encapsulates some of the industry best practices when it comes to rolling out potentially expensive new features on your site. The dark launch is a practice that we've used when launching features for Windows Live in the past. During a dark launch, the feature is enabled on the site but not actually shown in the user interface. The purpose of this is to monitor if the site can handle the load of the feature during day to day interactions without necessarily exposing the feature to end users in case it turns out the answer is no.

An example of a feature that could have been be rolled out using a dark launch is the replies tab on Twitter. A simple way to implement the @replies feature is to create a message queue (i.e. an inbox) for the user that contains all the replies they have been sent. To test if this approach was scalable, the team could have built this feature and had messages going into user's inboxes without showing the Replies tab in the UI. That way they could test the load on their message queue and fix bugs in it based on real user interactions without their users even knowing that they were load testing a new feature. If it turned out that they couldn't handle the load or they needed to beef up their message queuing infrastructure they could disable the feature, change the implementation and retest quickly without exposing their flaws to users.

The main problem with a dark launch is that it ignores the fact that users often use social features a lot differently than is expected by site developers. In the specific case of the Replies tab in Twitter, it is quite likely that the usage of "@username" replies would increase by a lot once the tab was introduced since the feature increased the chance that the recipient would see the response compared to a regular tweet. So the load from the dark launch would not be the actual load from having the feature enabled. So the next step is to introduce the feature to a subset of your users using a gradual ramp up approach.

During a gradual ramp up, you release the feature set to small groups of users preferably in a self contained group so you can see how users actually interact with the feature for real without bringing down your entire site if it turns out that their usage patterns greatly exceed your expectations. This is one of the reasons why Facebook Chat was gradually exposed to users from specific networks before being enabled across the entire site even though the feature had already been dark launched.

Another common practice for limiting the impact of certain features from impacting your core user experience is to isolate your features from each other as much possible. Although this should be a last resort, it is better if one or two features of your site do not work than if the entire site is down. The Twitter folks found this out the hard way when it turned out that traffic from their instant messaging interface was bringing down the site and finally resorted to disabling the ability to update Twitter via IM until further notice.  Ideally your site's APIs and Web services should be isolated from your core features so that even if the APIs are going over capacity it doesn't mean your entire site is down. Instead, it would mean that at worst access to data via your site's APIs was unavailable. The same applies for ancillary features like the Replies tab in Twitter or Facebook Chat, if that service was overloaded it shouldn't mean the entire site should be unavailable.  This is one area where following the principles of service oriented architecture can save your Web site a lot of pain.

Now Playing: Earth, Wind & Fire - Let's Groove


 

Categories: Web Development

In the past year I've spent a lot of time thinking about hiring due to a recent surge in the amount of interviews I've participated in as well as a surge in the number of folks I know who've decided to "try new things". One thing I've noticed is that software companies and teams within large software companies like Microsoft tend to fall into two broad camps when it comes to hiring. There are the teams/companies that seem to attract tons of smart, superstar programmers like a refrigerator door attracts magnets and then there those that use the beachcomber technique of sifting through tons of poorly written resumes hoping to find someone valuable but often ending up with people who seem valuable but actually aren't (aka good at interviewing, lousy at actually getting work done).

Steve Yegge talks about this problem in his post Done, and Gets Things Smart which is excerpted below

The "extended interview" (in any form) is the only solution I've ever seen to the horrible dilemma, How do you hire someone smarter than you? Or even the simpler problem, How do you identify someone who's actually Smart, and Gets Things Done? Interviews alone just don't cut it.
Let me say it more directly, for those of you who haven't taken this personally yet: you can't do what Joel is asking you to do. You're not qualified. The Smart and Gets Things Done approach to interviewing will only get you copies of yourself, and the work of Dunning and Kruger implies that if you hire someone better than you are, then it's entirely accidental.
...
So let's assume you're looking at the vast ocean of programmers, all of whom are self-professed superstars who've gotten lots of "stuff" done, and you want to identify not the superstars, but the super-heroes. How do you do it? Well, Brian Dougherty of Geoworks did it somehow. Jeff Bezos did it somehow. Larry and Sergey did it somehow. I'm willing to bet good money that every successful tech company out there had some freakishly good seed engineers.
...
You can only find Done, and Gets Things Smart people in two ways, and one of them I still don't understand very well. The first way is to get real lucky and have one as a coworker or classmate. You work with them for a few years and come to realize they're just cut from a finer cloth than you and your other unwashed cohorts. You may be friends with some of them, which helps with the recruiting a little, but not necessarily. The important thing is that you recognize them, even if you don't know what makes them tick.
...
I think Identification Approach #2, and this is the one I don't understand very well, is that you "ask around". You know. You manually perform the graph build-and-traversal done by the Facebook "Smartest Friend" plug-in, where you ask everyone to name the best engineer they know, and continue doing that until it converges.

This jibes with my experience watching various software startups and knowing the history of various teams at Microsoft over the past few years. The products that seem to have hired the most phenomenal programmers and have achieved great things often start off with some person trying to hire the smartest person they know or knew from past jobs (Approach #1). Those people in turn try to attract the smartest people they've known and that happens recursively (Approach #2).

I remember a few years ago chatting with a coworker who mentioned that some Harvard-based startup was hiring super smart, young Harvard alumni from Microsoft and a couple of other technology companies at a rapid clip. It seems people were recommended by their friends at the startup and those folks would in turn come back to Microsoft/Google/etc to convince their ex-Harvard chums to come join in the fun. It turns out that startup was Facebook and since then the company has impressed the world with its output. Google used to have a similar approach to hiring until the company grew too big and had to start utilizing the beachcomber technique as well. I've also seen this technique work successfully for a number of teams at Microsoft.

Although this technique sounds unrealistic, it actually isn't as difficult as it once was thanks to the Web and social networking sites. It is now quite easy for people to stay in touch with or reconnect with people they knew from previous jobs or back in their school days. Thus the big barrier to adopting this approach to hiring isn't that employees won't have any recommendations for super-smart people they'd love to work with if given the chance.  The real barrier is that most employers don't know how to court potential employees or even worse don't believe that they have to do so.  Instead they expect people to want to work for them which means they'll get a flood of awful resumes, put a bunch of candidates through the flawed interview process only to eventually get tired of the entire charade and finally hire the first warm body to show up after they reach their breaking point. All of this could be avoided if they simply leverage the social networks of their best employees. Unfortunately, common sense is never as common as you expect it to be.

Now Playing: Soundgarden - Jesus Christ Pose


 

Categories: Life in the B0rg Cube

As a developer who was raised on procedural and object oriented programming languages like C, C++ and Java it took me a while to figure out what people were raving about when it comes to the benefits of functional programming techniques. I always thought closures and  higher order functions were words used by snobby kids from MIT and grad students to show how overeducated they were as opposed to programming tools I'd ever find useful.

This thinking was additionally fueled by articles like Joel Spolsky's Can Your Programming Language Do This? which not only came off as snobby but also cemented the impression that higher order functions like map() and reduce() are for people solving "big" problems like the folks at Google who are trying to categorize the entire World Wide Web not people like me who write desktop feed readers in their free time.

All of this changed when I started learning Python.

With Python I started writing programs that threw around lambda functions and used list comprehensions to map, reduce and filter without even thinking twice about it. Afterwards when I'd go back to programming in C# 2.0 I'd marvel at how much more code it took to get things done. There were tasks which I could perform in a line of Python code that took four, five sometimes up to ten lines of C# code. I began to miss Python sorely.

Then I installed Visual Studio 2008 and got to use the Language Integrated Query (LINQ) features of C# 3.0 and was blown away. The C# folks had not only brought over functional programming constructs like lambda expressions (aka anonymous methods) but also had added the 3 core functions (map, reduce and filter) to all lists, collections and other implementers of the IEnumerable interface. So what are map, reduce and filter? They are higher order functions [which means they take functions as input] that operate on lists of objects. Here are their definitions from the Python documentation along with links to their C# 3.0 equivalents.

Function name in Python Description from Python Documentation C# 3.0 Equivalent
map Apply function to every item of iterable and return a list of the results. Enumerable.Select
reduce (aka fold or  accumulate) Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the iterable. If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. Enumerable.Aggregate
filter Construct a list from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. Enumerable.Where

With these three building blocks, you could replace the majority of the procedural for loops in your application with a single line of code. C# 3.0 doesn't just stop there. There are also a number of other useful higher order functions available on all enumerable/collection objects.

In the next version of RSS Bandit, we will support synchronizing your subscription state from Google Reader, NewsGator Online and the Common Feed List provided by the Windows RSS platform. This means that when the user hits [Update All Feeds] to refresh their subscriptions we need to (i) aggregate the unread item count across the different feed sources and store it (ii) ask each feed source to kick off its update process and (iii) on completion of the update determine if there are new items by recalculating the unread count across all feed sources and see if it differs from the value we got in the first step. Here's what the UpdateAllFeeds() method looks like 

        public void UpdateAllFeeds(bool force_download)
{
List<SubscriptionRootNode> rootNodes = this.GetAllSubscriptionRootNodes();
if (rootNodes != null)
{
if (_timerRefreshFeeds.Enabled)
_timerRefreshFeeds.Stop();
_lastUnreadFeedItemCountBeforeRefresh = rootNodes.Sum(n => n.UnreadCount);
FeedSources.Sources.ForEach(s => s.RefreshFeeds(force));
}
}

In the UpdateAllFeeds() method we use Enumerable.Sum which is a specialized reduce() function to calclulate the unread count of each of the different subscription sources. Then we use a ForEach extension method to effectively loop through each feed source and call its RefreshFeeds() method. That would have been two for loops in older versions of C# or Java.

We also perform more complicated reduce or fold operations which go outside the norm of just accumulating some numeric value in RSS Bandit. When a user subscribes to a new feed, we populate a drop down list with the list of categories from the user's subscriptions so the user can decide which category to place the feed in. With multiple feed sources, we need to populate the drop down with the list of categories used in Google Reader, NewsGator, the Windows Common Feed List as well as those within RSS Bandit while taking care to eliminate duplicates. The GetCategories() method shown below does the bulk of that work in a single line of code via Enumerable.Aggregate

public IEnumerable<string> GetCategories()
{
  //list containing default category used for bootstrapping the Aggregate function
  var c = new List<string>();
  c.Add(DefaultCategory);
  IEnumerable<string> all_categories = c; 

  //get a list of the distinct categories used across all feed sources
  all_categories = FeedSources.Sources.Aggregate(all_categories, 
                                 (list, s) => list.Union(s.Source.GetCategories().Keys, StringComparer.InvariantCultureIgnoreCase));                        
  return all_categories;
}

The first step is to set up a list with the default category ("Unclassified") and then use Aggregate() to go through each source and perform a union of the current list of categories with the list of categories from that feed source. The categories are compared in a case insensitive manner to remove duplicates from the union. If there are no categories defined in any of the feed sources then only the default category ends up being returned.

When a user is viewing their Google Reader feeds in RSS Bandit, any action the user takes in the application is reflected on the Web. So each time a user marks an item as read, renames a feed title, subscribes or unsubscribes from a feed, a Web request is made behind the scenes to update the user's state on the Web via Google Reader's REST API. Instead of making the Web requests synchronously and possibly tying up the UI I instead add each Web request intended for the Google Reader API to a queue of pending operations. Since the operations may sit in the queue for a few seconds or minutes in the worst case, we can optimize network usage by removing events from the queue if they end up being redundant.

For example. the DeleteFeedFromGoogleReader() method removes every pending operation related to a particular feed if an unsubscribe event is enqueued. After all, there is no point in making Web requests to mark the feed as read or rename it, if the next request from the user is to unsubscribe from the feed. The method uses a filter operation, Enumerable.Where, to determine the events to remove as shown below

       public void DeleteFeedFromGoogleReader(string googleUserID, string feedUrl)
        {
            var deleteOp = new PendingGoogleReaderOperation(GoogleReaderOperation.DeleteFeed, 
                                                            new object[] {feedUrl},googleUserID);

            lock (pendingGoogleReaderOperations)
            {
                //remove all pending operations related to the feed since it is going to be unsubscribed
                IEnumerable<PendingGoogleReaderOperation> ops2remove 
                  = pendingGoogleReaderOperations.Where(op => op.GoogleUserName.Equals(deleteOp.GoogleUserName)
                                                        && op.Parameters.Contains(feedUrl));

                foreach (PendingGoogleReaderOperation op2remove in ops2remove)
                {
                    pendingGoogleReaderOperations.Remove(op2remove); 
                }

                pendingGoogleReaderOperations.Add(deleteOp);   
            }
        }

There are more examples from the RSS Bandit code base but I'm sure you get the idea. The point is that functional programming techniques give you the ability to get more bang for your buck (where bucks are lines of code) even when performing the most mundane of tasks. 

If your programming language doesn't support lambda functions or have map/reduce/filter functions built in, you just might be a Blub Programmer who is missing out on being more productive because your programming language doesn't support "esoteric" or "weird" features.

My next step is to spend more time with Lisp. Wish me luck. :)

Now Playing: Lil Wayne - Lollipop (remix) (feat. Kanye West)


 

Categories: Programming

Several months ago, danah boyd wrote a rather insightful post entitled one company, ten brands: lessons from retail for tech companies which contained the following pieces of wisdom

Lots of folks are unaware that multiple brands are owned by the same company (e.g., the same company owns Gap, Banana Republic, Old Navy). Consumer activists often complain that this practice is deceptive because it tricks consumers into believing that there are big distinctions between brands when, often, the differences are minimal. Personally, while I'd love to see more consumer brand awareness, but I think that brand distinctions play an important role. I just wish that the tech industry would figure this out.
...
Unfortunately, I don't think that many companies are aware of the limitations of their brands. When they're flying high, their brands are invincible and extending it to a wide array of products seems natural. Yet, over time, tech companies' brands get entrenched. Certain users identify with it; others don't. New products using that brand enter into the market with both cachet and baggage. Yet, tech companies tend to hold onto their brands for dear life and assume users will forget. Foolish.
...
teens also have plenty to say about the brands themselves. Yahoo! and AOL, for example, are for old people. When I asked why they use Yahoo! Mail and AOL Instant Messaging if they're for old people, they responded by telling me that their parents made those accounts for them. Furthermore, email is for communicating with old people and AIM is "so middle school" and both are losing ground to SNS and SMS. While Microsoft is viewed in equally lame light amongst youth I spoke t with, it's at least valued as a brand for doing work. Yet, even youth who use MSN messenger think that msn.com is for old people. Why shouldn't they? When I logged in just now, the main visual was a woman with white hair sitting on a hospital bed with the caption "10 Vital Questions to Ask Your Doctor."
...
I would like to offer two bits of advice to all of the major tech companies out there: 1) Start sub-branding; and 2) Start doing real personalization.

If you're creating a new product, launch it with a new brand. Put your flagship brand on the bottom of the page, letting people know that this is backed by you - this is not about deception. Advertise it alongside your flagship brand if you think that'll gain you traction. But let the new product develop a life of its own and not get flattened by a universal brand... If you're buying a well-established brand, don't flatten it, especially if it's loved by youth. Kudos to Google wrt YouTube; boo to Yahoo! wrt Launch. Even at the coarse demographic level, people are different; don't treat them as a universal bunch, even if your back-end serves up the same thing to different interfaces.

As danah boyd points out above, as companies enter the new markets they bring their baggage brands along with them. When the brand doesn't mesh with the target audience then it is hard to get traction. Creating new brands that are distanced from the established brand is often a good idea in this case. An excellent example of this is Microsoft's branding strategy with XBox. With XBox, Microsoft created a new brand that distanced itself from the company's staid office productivity and accounting software roots but still let people know that the software powerhouse was behind the brand (notice how there is no mention of Microsoft until you scroll to the bottom of XBox.com?) .

But why did Microsoft need to create a new brand in the first place? Why couldn't it have just been called Windows Gaming Console or "Microsoft Gaming Console"? You should be able to figure out the answers to these questions if you are familiar with the 22 Immutable Laws of Branding. I particularly like laws #2 and #10 excerpted below

The Law of Contraction: A brand becomes stronger when you narrow its focus. By narrowing the focus to a single category, a brand can achieve extraordinary success. Starbucks, Subway and Dominos Pizza became category killers when they narrowed their focus.

The Law of Extensions: The easiest way to destroy a brand is to put its name on everything. More than 90% of all new product introductions in the U.S. are line extensions. Line extensions destroy brand value by weakening the brand. The effects can be felt in diminished market share of the core brand, a loss of brand identity, and a cannibalization of the one's own sales. Often, the brand extension directly attacks the strength of the core brand. Does Extra Strength Tylenol imply that regular Tylenol isn't strong enough?

Historically, the software companies have built brands based on what their customers want to do instead of who their customers are. So we've ended up with a lot of task based brands like Google™ for Web searching, Adobe Photoshop™ for photo editing, or Microsoft PowerPoint™ for creating presentations.  These brands come from a world where software is utilitarian and is simply a tool for getting things done as opposed to being an integral part of people's identities and lifestyle. This means that a lot of software companies don't have experience building brands around people's personal experiences and background. With the rise of social software, we've entered a world where software is no longer just a tool for individual tasks but a key part of how millions of people interact with each other and present themselves every day. The old rules no longer apply.

In today's world, the social software you use says as much about you as the brand of clothes you wear or the kind of watch you rock. The average LinkedIn  user is different from the average Facebook user who is different from the average MySpace user even though they are all social networking sites.  Like weekend warriors who work a boring 9-5 during the week and get crunk on the weekends, people who utilize multiple social networking sites often do so to express different sides of their personality or to interact with different sets of friends as opposed to going back and forth based on the features of the sites.

This means that he utilitarian software brand doesn't really work well in this world. It isn't about having the best features or being the best site for social networking, it is about being the best place for me and my friends to hang out online. When put in those terms it is unsurprising that social networking sites are often dominant in specific geographic regions with no one site being globally dominant.

All of this is a long winded way of saying that sticking to a single brand, even if it is just the company name, gets in the way of breaking into new markets when it comes to "Web 2.0". Slapping Google or Yahoo! in front of a brand may make it more likely to be used by a certain segment of the population but it also places constraints on what can be done with those services due to people's expectations of the brand. There is a reason why Flickr eventually killed Yahoo! Photos and why it was decided that Google Video be relegated to being a search brand while YouTube would be the social sharing brand. The brand baggage and the accompanying culture made them road kill.

This is one situation where startups have an inherent advantage over the established Web players because they don't have any brand baggage holding them back. It is easy to be nimble and try out new things when there are no fixed expectations from your product team or your users about what your application is supposed to be.

With their recent acquisitions the established Web players like Yahoo! and Google are learning what other industries have learned over time; sometimes it pays to have different brands for different audiences.

NOTE: Creating different brands for different audiences is not the same as having lots of overlapping brands with  unclear differentiation.

Now Playing: Outkast - Hollywood Divorce (feat. Lil' Wayne & Snoop Doggy Dogg)


 

Categories: Ramblings | Social Software

People who've been following me on Twitter know that I've been spending my weekends turning RSS Bandit into a desktop client for Google Reader and NewsGator Online. Although I was familiar with NewsGator's SOAP API, I didn't have the patience to figure out the differences in using SOAP services that had been made in Visual Studio 2008 given the last time I tried it I was using Visual Studio 2003 and there seemed to be fairly significant differences. In addition, the chance to program against a fully featured RESTful API also piqued my curiosity. For these reasons I decided to use the NewsGator REST API when working on the features for synchronizing with NewsGator Online.

After I completed work on the synchronization with NewsGator Online, it was quite clear to me that the developer(s) who designed the NewsGator REST API didn't really understand the principles behind building RESTful services. For the most part, it looks like a big chunk of the work done to create a REST API was to strip the SOAP envelopes from their existing SOAP services and then switch to URL parameters instead of using SOAP messages for requests. This isn't REST, it is POX/HTTP (Plain Old XML over HTTP). Although this approach gets you out of the interoperability and complexity tax which you get from using XSD/SOAP/WS-*, it doesn't give you all of the benefits of conforming to the Web's natural architecture.

So I thought it would be useful to take a look at some aspects of the NewsGator REST API and see what is gained/lost by making them more RESTful.

What Should a Feed Reader's REST API Look Like?

Before the NewsGator REST API can be critiqued, it is a good idea to have a mental model of what the API would look like if it was RESTful. To do this, we can follow the steps in Joe Gregorio's How to Create a REST Protocol which covers the four decisions you must make as part of the design process for your service

  • What are the URIs? [Ed note - This is actually  "What are the resources?" ]
  • What's the format?
  • What methods are supported at each URI?
  • What status codes could be returned?

There are two main resources or data types in a feed reader; feeds and subscription lists. When building RESTful protocols, it pays to reuse standard XML formats for your message payloads since it increases the likelihood that your clients will have libraries and tools for dealing with your payloads. Standard formats for feeds are RSS and Atom (either is fine although I'd prefer to go with Atom) and a standard format for feed lists is OPML. In addition, there are bits of state you want to associate with feeds (e.g. what items have been read/unread/flagged/etc) and with subscription lists (e.g. username/password of authenticated feeds, etc). Since the aforementioned XML formats are extensible, this is not a problem to accommodate.

What methods to support also seems straightforward. You will want full Create, Retrieve, Update and Delete operations on the subscription list. So you will need the full complement of supporting POST, GET, PUT and DELETE on subscription lists. For feeds, you will only need to fetch them and update the user-specific associated with each feed such as if an item has been read or flagged. So you'd want to support GET and PUT/POST.

The question of what error codes to return would probably be at least 200 OK (success), 304 Not Modified (feeds haven't changed since last fetch), 400 Bad Request (invalid or missing request parameters) ,401 Unauthorized (invalid or no authentication credentials provided) and 404 Not Found (self explanatory).

This seems like a textbook example of a situation that calls for using the  Atom Publishing Protocol (AtomPub). The only wrinkle would be that AtomPub requires uploading the entire document with changes when updating the state of an item or media resource. This is pretty wasteful when updating the state of an item in an RSS feed (e.g. indicating that the user has read the item) since you'd end up re-uploading all the content of the item just to change the equivalent of a flag on the item. There have been a number of proposals and workarounds proposed for this limitation of AtomPub such as the HTTP PATCH Internet Draft and Astoria's introduction of a MERGE HTTP method. In thinking about this problem, I'm going to assume that the ideal service supports partial updates of feed and subscription documents using PATCH/MERGE or something similar.

Now we have enough of a background to thoughtfully critique some of the API design choices in the NewsGator API.

REST API Sin #1: URIs Identify Methods Instead of Resources (The Original Sin)

When a lot of developers think of REST they do not think of the REpresentational State Transfer (REST) architectural style as described in chapter five of Roy Fieldings' Ph.D thesis. Instead they think of exposing remote procedure calls (RPC) over the Web using plain old XML payloads, without schemas and without having to use an interface definition language (i.e XML-based RPC without the encumbrance of SOAP, XSD or WSDL).  

The problem with this thinking is that it often violates the key idea behind REST. In a RESTful Web service, the URIs identify the nouns (aka resources) that occur in the service and the verbs are the HTTP methods that operate on those resources. In an RPC-style service, the URLs identify verbs (aka methods) and the nouns are the parameters to these methods. The problem with the RPC style design is that it increases the complexity of the clients that interact with the system. Instead of a client simply knowing how to interact with a single resource (e.g. an RSS feed) and then signifying changes in state by the addition/removal of data in the document (e.g. adding an <is-read>true</is-read> element as an extension to indicate an item is read), it has to specifically target each of the different behaviors a system supports for that item explicitly (e.g. client needs to code against markread(), flagitem(), shareitem(), rateitem(), etc). The reduced surface area of the interface is not only a benefit to the client but to the service as well.  

Below are a few examples which contrast the approach taken by the NewsGator API with a more RESTful method based on AtomPub.

1a) Modifying the name of one of the locations a user synchronizes data from using the Newsgator RESTful API

Service URL http://services.newsgator.com/ngws/svc/Location.aspx/<locationId>/changelocationnamebylocationid
Method POST
Request Format application/x-www-form-urlencoded
Payload "locationname=<new location name>” 
Response No response

 

1b.) Modifying the name of one of the locations a user synchronizes data from using a more RESTful approach 

Service URL http://services.newsgator.com/ngws/svc/Location.aspx/<locationId>/
Method POST
Request Format application/xml
Payload

<opml xmlns:ng="http://newsgator.com/schema/opml">

<head />

<body>
<outline text="<location name>" ng:isPublic="<True|False>" ng:autoAddSubs="<True|False>" />

</body>

</opml>

Response No response

2a.) Deleting a folder using the NewsGator REST API

Service URL http://services.newsgator.com/ngws/svc/Folder.aspx/delete/
Method DELETE
Request Format application/x-www-form-urlencoded
Payload "fld=<NewsGator folder id>"
Response No response

2b.) Deleting a folder using a more RESTful approach

Service URL http://services.newsgator.com/ngws/svc/Folder.aspx/<folder-id>
Method DELETE
Request Format Not applicable
Payload Not applicable
Response No response

3a.) Retrieve a folder or create it if it doesn't exist using the NewsGator REST API

Service URL http://services.newsgator.com/ngws/svc/Folder.aspx/getorcreate
Method POST
Request Format application/x-www-form-urlencoded
Payload "parentid=<NewsGator folder id>&name=<folder name>&root=<MYF|MYC>"
Response

<opml xmlns:ng="http://newsgator.com/schema/opml">

<head>

<title>getorcreate</title>

</head>

<body>
<outline text="<folder name>" ng:id="<NewsGator folder id>" />

</body>

</opml>

3b.) Retrieve a folder or create it if it doesn't exist using a more RESTful approach (the approach below supports creating multiple and/or nested folders in a single call)

Service URL http://services.newsgator.com/ngws/svc/Folder.aspx/<root>/<folder-id>
Method POST
Request Format application/xml
Payload

<opml xmlns:ng="http://newsgator.com/schema/opml">

<head />

<body>
<outline text="<folder name>" />

</body>

</opml>

Response

<opml xmlns:ng="http://newsgator.com/schema/opml">

<head>

<title>getorcreate</title>

</head>

<body>
<outline text="<folder name>" ng:id="<NewsGator folder id>" />

</body>

</opml>

REST API Sin #2: Not Providing a Consistent Interface  (A Venial Sin)

In REST, all resources are interacted with using the same generic and uniform interface (i.e. via the HTTP methods - GET, PUT, DELETE, POST, etc). However it defeats the purpose of exposing a uniform interface if resources in the same system respond very differently when interacting with them using the same HTTP method. Consider the following examples taken from the NewsGator REST API.

1.) Deleting a folder using the NewsGator REST API

Service URL http://services.newsgator.com/ngws/svc/Folder.aspx/delete
Method DELETE
Request Format application/x-www-form-urlencoded
Payload "fld=<NewsGator folder id>"
Response No response

2.) Deleting a feed using the NewsGator REST API

Service URL http://services.newsgator.com/ngws/svc/Subscription.aspx
Method DELETE
Request Format application/xml
Payload

<opml xmlns:ng="http://newsgator.com/schema/opml">

<head>

<title>delete</title>

</head>

<body>

<outline text=”subscription title” ng:id=”<NewsGator subscription id>”

ng:statusCode=”<0|1>” />

</body>

</opml>

Response No response

From the end user or application developer's perspective the above actions aren't fundamentally different. In both cases, the user is removing part of their subscription list which in turn deletes some subset of the <outline> elements in the user's subscriptions OPML file.

However the NewsGator REST API exposes this fundamentally identical task in radically different ways for deleting folders versus subscriptions. The service URL is different. The request format is different. Even the response is different. There's no reason why all three of these can't be the same for both folders and subscriptions in a user's OPML feed list. Although it may seem like I'm singling out the NewsGator REST API, I've seen lots of REST APIs that have similarly missed the point when it comes to using REST to expose a uniform interface to their service.

Conclusion

These aren't the only mistakes developers make when designing a REST API, they are just the most common ones. They often are a sign that the developers simply ported some old or legacy API without actually trying to make it RESTful. This is the clearly case with the NewsGator REST API which is obviously a thin veneer over the NewsGator SOAP API.

If you are going to build a RESTful API, do it right. Your developers will thank you for it.

Now Playing: Montell Jordan - This is How We Do It


 

Categories: XML Web Services

Disclaimer: This post does not reflect the opinions, thoughts, strategies or future intentions of my employer. These are solely my personal opinions. If you are seeking official position statements from Microsoft, please go here.

About a month ago Joel Spolsky wrote a rant about Microsoft's Live Mesh project which contained some interesting criticism about the project and showed that Joel has a personal beef with the Googles & Microsofts of the world for making it hard for him to hire talented people to work for his company. Unsurprisingly, lots of responses focused on the latter since it was an interesting lapse in judgement for Joel inject his personal frustrations into what was meant to be a technical critique of a software project. However there were some ideas worthy of discussion in Joel's rant that I've been pondering over the past month. The relevant parts of Joel's article are excerpted below

It was seven years ago today when everybody was getting excited about Microsoft's bombastic announcement of Hailstorm, promising that "Hailstorm makes the technology in your life work together on your behalf and under your control."

What was it, really? The idea that the future operating system was on the net, on Microsoft's cloud, and you would log onto everything with Windows Passport and all your stuff would be up there. It turns out: nobody needed this place for all their stuff. And nobody trusted Microsoft with all their stuff. And Hailstorm went away.
...
What's Microsoft Live Mesh?

Hmm, let's see.

"Imagine all your devices—PCs, and soon Macs and mobile phones—working together to give you anywhere access to the information you care about."

Wait a minute. Something smells fishy here. Isn't that exactly what Hailstorm was supposed to be? I smell an architecture astronaut.

And what is this Windows Live Mesh?

It's a way to synchronize files.

Jeez, we've had that forever. When did the first sync web sites start coming out? 1999? There were a million versions. xdrive, mydrive, idrive, youdrive, wealldrive for ice cream. Nobody cared then and nobody cares now, because synchronizing files is just not a killer application. I'm sorry. It seems like it should be. But it's not.

But Windows Live Mesh is not just a way to synchronize files. That's just the sample app. It's a whole goddamned architecture, with an API and developer tools and in insane diagram showing all the nifty layers of acronyms, and it seems like the chief astronauts at Microsoft literally expect this to be their gigantic platform in the sky which will take over when Windows becomes irrelevant on the desktop. And synchronizing files is supposed to be, like, the equivalent of Microsoft Write on Windows 1.0.

As I read the above rant I wondered what world Joel has been living in the past half decade. Hailstorm has actually proven to have been a very visionary and accurate picture of how the world ended up. A lot of the information that used to sit in my desktop in 2001 is now in the cloud. My address book is on Facebook, my photos are on Windows Live Spaces and Flickr, my email is in Hotmail and Yahoo! Mail, while a lot of my documents are now on SkyDrive and Google Docs. Almost all of these services provide XML-based APIs for accessing my data and quite frankly I find it hard to distinguish the ideas behind a unified set of user-centric cloud APIs that was .NET My Services from Google GData. A consistent set of APIs for accessing a user's contact lists, calendar, documents, inbox and profile all stored on the servers of a single company? Sounds like we're in a time warp doesn't it? Even more interesting is that outlandish sounding scenarios at the time such as customers using a delegated authentication model to grant applications and Web sites temporary or permanent access to their data stored in the cloud are now commonplace. Today we have OAuth, Yahoo! BBAuth, Google AuthSub, Windows Live DelAuth and even just the plain old please give us your email password.

In hindsight the major problem with Hailstorm seems to have been that it was a few years ahead of its time and people didn't trust Microsoft. Funny enough, a lot of the key people who were at Microsoft during that era like Vic Gundotra and Mark Lucovsky are now at Google, a company and a brand which the Internet community trusts a lot more than Microsoft, working on Web API strategy. 

All of this is a long winded way of saying I think Joel's comparison of Live Mesh with Hailstorm is quite apt but just not the way Joel meant it. I believe that like Hailstorm, Live Mesh is a visionary project that in many ways tries to tackle problems that people will have or don't actually realize they have. And just like with Hailstorm where things get muddy is separating the vision from the first implementation or "platform experience" of that vision.

I completely agree with Joel that synchronizing files is not a killer application. It just isn't sexy and never will be. The notion of having a ring or mesh of devices where all my files synchronize across each device in my home or office is cool to contemplate from a technical perspective. However it's not something I find exciting or feel that I need even though I'm a Microsoft geek with a Windows Mobile phone, an XBox 360, two Vista laptops and a Windows server in my home. It seems I'm not the only one that feels that way according to a post by a member of the Live Mesh team entitled Behind Live Mesh: What is MOE? which states

Software + Services

When you were first introduced to Live Mesh, you probably played with the Live Desktop.  It’s pretty snazzy.  Maybe you even uploaded a few files too.  Hey, it’s a cool service!  You can store stuff in a cloud somewhere and access it anywhere using a webpage.  Great!

As I look at the statistics on the service though, I notice that a significant portion of our users have stopped here.  This pains me, as there’s a whole lot more you can do with Live Mesh.  Didn’t you hear all the hoopla about Software + Services?  Ever wonder, “Where’s the software?”

You might have noticed that on the device ring there’s a big orange button with a white ‘+’ sign.  The magic happens when you click that big orange button and opt to “add a device” to your mesh. 

So what excites me as a user and a developer about Live Mesh? I believe seamless synchronization of data as a platform feature is really interesting. Today I use OutSync to add people's Facebook photos to their contact information on my Windows Mobile phone. I've written my own RSS reader which synchronizes the state of my RSS subscriptions with Google Reader. Doug Purdy wrote FFSync so he can share his photos, music taste and other data on his Mac with his friends on FriendFeed. It may soon be possible to synchronize my social graph across multiple sites via Google Friend Connect. Google is working on using Google Gears to give me offline access to my documents in Google Docs by synchronizing the state between my desktop and their cloud. Earlier this week Apple announced mobile.me which enables users to synchronize their contacts, emails, calendar and photos across the Web and all their devices.

Everywhere I look data synchronization is becoming more and more important and also more commonplace. I expect this trend to continue over time given the inevitable march of the Web. Being able to synchronize my data and my actions from my desktop to the Web or across Web sites I frequent is extremely enabling. Thus having a consistent  set of standards-based protocols for enabling these scenarios as well as libraries for the key platforms that make this approachable to developers will be very beneficial to users and to the growth of the Web. 

At the rate things are going, I personally believe that this vision of the Web will come to pass with or without Microsoft in the same way that Hailstorm's vision of the Web actually came to pass even though Microsoft canned the project. Whether Microsoft is an observer or a participant in this new world order depends on whether Live Mesh as a product and as a vision fully embraces the Web and collaboration with Web companies (as Google has ably done with GData/OpenSocial/FriendConnect/Gears/etc) or not. Only time will tell what happens in the end.

Now Playing: Usher - In This Club (remix) (feat. Beyonce & Lil Wayne)


 

Categories: Technology

Early this week, Microsoft announced a project code named Velocity. Velocity is a distributed in-memory object caching system in the vein of memcached (aka a Distributed Hash Table).  If you read any modern stories of the architectures of popular Web sites today such as the recently published overview of the LinkedIn social networking site's Web architecture, you will notice a heavy usage of in-memory caching to improve performance. Popular web sites built on the LAMP platform such as Slashdot, Facebook, Digg, Flickr and Wikipedia have all been using memcached to take advantage of in-memory storage to improve performance for years. It is good to see Microsoft step up with a similar technology for Web sites built on the WISC platform.

Like memcached, you can think of Velocity as a giant hash table that can run on multiple servers which automatically handles maintaining the balance of objects hashed to each server and transparently fetches/removes objects from over the network if they aren't on the same machine that is accessing an object in the hash table. In addition, you can add and remove servers from the cluster and the cache automatically rebalances itself.

The Velocity Logical Model

The following diagram taken from the Velocity documentation is helpful in discussing its logical model in detail

Velocity logical model

In the above diagram, each cache host is a server that participates in the cache cluster. Your application can have multiple named caches (e.g. "Inventory", "Product Catalog", etc) each of which can be configured separately. In addition, each named cache can have one or more named region. For example, the Sports region of your Product Catalog or the Arts region of your product catalog. Below is some sample code that shows putting and getting objects in and out of a named cache.

CacheFactory CacheCluster1 = new CacheFactory();
Cache inventoryCache = CacheCluster1.GetCache("Inventory");

Sneaker s = (Sneaker)inventoryCache.Get("NikeAirForce1");
s.price = s.price * 0.8; //20% discount
inventoryCache.Put("NikeAirForce1", s);

Velocity ships with the ability to search for objects by tag but it is limited to objects within a specific region. So you can fetch all objects tagged "Nike" or "sneakers" from the sports region of your product catalog. As shown in the above diagram, a limitation of regions is that all items in a region must be on the same physical server. Below is an example of what the code for interacting with regions looks like

CacheFactory CacheCluster1 = new CacheFactory();
Cache catalog= CacheCluster1.GetCache("Catalog");
List <KeyValuePair<string, object>> sneakers = catalog.GetByTag("Sports", "sneakers");

foreach (var kvp in sneakers)
{
Sneaker s = kvp.Value as Sneaker;
/* do stuff with Sneaker object */
}

The above sample searches for all items tagged "sneakers" from the Sports region of the Catalog cache.

The notion of regions and tagging is one place Velocity diverges from the simpler model of technologies like memcached and provides more functionality.

Eviction Policy and Explicit Object Expiration

Since memory is limited on a server, there has to be an eviction policy that ensures that the cache doesn't end up growing to big thus forcing the operating system to get all Virtual Memory on your ass by writing pages to disk. Once that happens you're in latency hell since fetching objects from the cache will involve going to disk to fetch them. Velocity gives you a couple of knobs that can be dialed up or down as needed to control how eviction or expiration of objects from the cache works. There is a file called ClusterConfig.xml which is used for configuring the eviction and expiration policy of each named cache instance. Below is an excerpt of the configuration file showing the policies for some named cache instances

<!-- Named cache list -->
<
caches>
<
cache name="default" type="partitioned">
<
policy>
<
eviction type="lru" />
<
expiration isExpirable="false" />
</
policy>
</
cache>
<
cache name="Inventory" type="partitioned">
<
policy>
<
eviction type="lru" />
<
expiration isExpirable="true" defaultTTL="50" />
</
policy>
</
cache>
</
caches>

The above excerpt indicates that the default and Inventory caches utilize a Least Recently Used algorithm for determining which objects are evicted from the cache. In addition, it specifies the default interval after which an object can be considered to be stale in the Inventory cache.

The default expiration interval can actually be overridden when putting an object in the cache by specifying a TTL parameter when calling the Put() method.

Concurrency Models: None, Optimistic, or Pessimistic

One of the first things you learn about distributed computing in the real world is that locks are bad mojo. In the way locks traditionally work, an object can be locked by a caller meaning everyone else interested in the object has to wait their turn. Although this prevents errors in your system occurring due to multiple callers interacting with the object at once, it also means there are built-in bottle necks in your system. So lesson #1 of scaling your service is often to get rid of as many locks in your code as possible. Eventually this leads to systems like eBay which doesn't use database transactions and Amazon's Dynamo which doesn't guarantee data consistency.

So what does this have to do with Velocity? Systems designed to scale massively like memcached don't support concurrency. This leads to developers asking questions like this one taken from the memcached mailing list

Consider this situation:-

  • A list with numeric values: 1,2,3
  • Server1: Gets the list from memcache.
  • Server2: Also gets the list from memcache.
  • Server1: Removes '1' from the list.
  • Server2: Removes '3' from the list.
  • Server1: Puts back a list with 2,3 in list in memcache.
  • Server2: Puts back a list with 1,2 in list in memcache.
Note:Since, both servers have their instance of list objs.
This is not what we need to do. Becoz, both servers are putting an incorrect list in memcache.Ideally what shud have happened was that in the end a list with only '1' shud be put back in memcache. This problem occurs under load and happens in case of concurrent threads.
What I want is that memcache shud restrict Server2 and a consistent list shud be there in memcache. How do I handle such problem in memcache environment?? I know we can handle at application server end by doing all these operations through a centralized place(gets and puts), but how do I handle it in Memcache????
  Any help wud be appreciated?

Unfortunately for the author of the question above, memcached doesn't provide APIs for concurrent access and enforcing data consistency (except for numeric counters). So far, the code samples I've shown for Velocity also do not support concurrency.

However there are APIs for fetching or putting objects that support optimistic and pessimistic concurrency models. In the optimistic concurrency model, instead of taking a lock, the objects are given a version number and the caller is expected to specify the version number of the object they have modified when putting it back in the cache. If the object has been modified since the time it was retrieved then there is a version mismatch error. At this point, the caller is expected to re-fetch the object and make their changes to the newly retrieved object before putting it back in the cache. Below is a code sample taken from the Velocity documentation that illustrates what this looks like in code

        /* At time T0, cacheClientA and cacheClientB fetch the same object from the cache */ 

//-- cacheClientA pulls the FM radio inventory from cache
CacheFactory clientACacheFactory = new CacheFactory();
Cache cacheClientA = clientBCacheFactory.GetCache("catalog");
CacheItem radioInventory = cacheClientA.GetCacheItem("electronics", "RadioInventory");


//-- cacheClientB pulls the same FM radio inventory from cache
CacheFactory clientBCacheFactory = new CacheFactory();
Cache cacheClientB = clientBCacheFactory.GetCache("catalog");
CacheItem radioInventory = cacheClientA.GetCacheItem("electronics", "RadioInventory");


//-- At time T1, cacheClientA updates the FM radio inventory
int newRadioInventory = 155;
cacheClientA.Put("electronics", "RadioInventory", newRadioInventory,
radioInventory.Tags, radioInventory.Version);

//-- Later, at time T2, cacheClientB tries to update the FM radio inventory
// AN ERROR OCCURS HERE
int newRadioInventory = 130;
cacheClientB.Put("electronics", "RadioInventory", newRadioInventory,
radioInventory.Tags, radioInventory.Version);

In the pessimistic concurrency model, the caller specifically takes a lock by calling GetAndLock() with a lock time out. The lock is then held until the time out or until the object is put back using PutAndUnlock(). To prevent this from being a performance nightmare, the system does not block requests if a lock is held on an object they want to manipulate. Instead the request is rejected (i.e. it fails).

Update: Some people have commented here and elsewhere that memcached actually does support the optimistic concurrency model using the gets and cas commands. Sorry about that, it wasn't exposed in the memcached libraries I've looked at.

Final Thoughts

From my perspective, this is a welcome addition to the WISC developer's toolkit. I also like that it pushes the expectations of developers on what they should expect from a distributed object cache which I expect will end up being good for the industry overall and not just developers on Microsoft's platforms.

If the above sounds interesting, there is already a technology preview available for download from MSDN here. I've downloaded it but haven't tried to run it yet since I don't have enough machines to test it in the ways I would find interesting. As you can expect there is a Velocity team blog. Subscribed.

Now Playing: 50 Cent - These N*ggas Ain't Hood (feat. Lloyd Banks & Marquis)


 

Categories: Web Development

Matt Asay of C|Net has an article entitled Facebook adopts the CPAL poison pill where he writes

Instead, by choosing CPAL, Facebook has ensured that it can be open source without anyone actually using its source. Was that the intent?

As OStatic explains, CPAL requires display of an attribution notice on derivative works. This practice, which effectively requires downstream code to carry the original developer(s)' logo, came to be known as "badgeware." It was approved by the OSI but continues to be viewed with suspicion within the open-source community.

I've written before about how most open-source licenses don't apply themselves well to the networked economy. Only the OSL, AGPL, and CPAL contemplate web-based services. It's not surprising that Facebook opted for one of these licenses, but I am surprised it chose the one least likely to lead to developers actually modifying the Facebook platform.

If the point was to protect the Facebook platform from competition (i.e., derivative works), Facebook chose a good license. If it was to encourage development, it chose the wrong license.

But if the purpose was to prevent modifications of the platform, why bother open sourcing it at all?

I've seen more than one person repeat the sentiment in the above article which leaves me completely perplexed. With fbOpen Facebook has allowed anyone who is interested to run Facebook applications and participate in what is currently the most popular & vibrant social network widget ecosystem in the world.

I can think of lots of good reasons for not wanting to adopt fbOpen. Maybe the code is in PHP and you are a Ruby On Rails shop. Or maybe it conflicts with your company's grand strategy of painting Facebook as the devil and you the heroes of openness (*cough* Google *cough*). However I can't see how requiring that you mention somewhere on your site that your social network's widget platform is powered by the Facebook developer platform is some sort of onerous POISON PILL which prevents you from using it. In the old days, companies used to charge you for the right to say your application is compatible with theirs, heck, Apple still does. So it seems pretty wacky for someone to call Facebook out for letting people use their code and encouraging them to use the Facebook brand in describing their product. Shoot!

The premise of the entire article is pretty ridiculous, it's like calling the BSD License a poison pill license because of the advertising clause. This isn't to say there aren't real issues with an advertising clause as pointed out in the GNU foundation's article The BSD License Problem. However as far as I'm aware,  adopters of fbOpen don't have to worry about being obligated to display dozens powered by X messages because every bit of code they depend on requires that it is similarly advertised. So that argument is moot in this case.

Crazy article but I've come to expect that from Matt Asay's writing.

Now Playing: Eminem & D12 - Sh*t On You


 

After weeks of preparatory work we are now really close to shipping the alpha version of the next release of RSS Bandit codenamed Phoenix. As you can see from the above screen shot, the key new feature is that you can read feeds from Google Reader, NewsGator Online and the Windows Common Feed List from RSS Bandit in a fully synchronized desktop experience.

This has been really fun code to write and I'm pretty sure I have a pending blog post in me about REST API design based on my experiences using the NewsGator REST API. The primary work items we have are around updating a bunch of the GUI code to realize that there are now multiple feed lists loaded and not just one. I estimate we'll have a version ready for our users to try out on the 14th or 15th of this month.

Your feedback will be appreciated.

Now Playing: Ben Folds - The Luckiest


 

Categories: RSS Bandit

Recently the folks behind Twitter came clean on the architecture behind the service and it is quite clear that the entire service is being held together by chewing gum and baling wire. Only three MySQL database servers for a service that has the I/O requirements of Twitter? Consider how that compares to other Web 2.0 sites that have come clean with their database numbers; Facebook has 1800, Flickr has 166, even Wikipedia has 20. Talk about bringing a knife to a gunfight.

Given the fact that Twitter has had scaling issues for over a year it is surprising that not only has it taken so long for them to figure out that they need a re-architecture but more importantly they decided that having a developer/sys admin manage fail over and traffic spikes by hand was cheaper to the business than buying more hardware and a few weeks of coding. 

A popular social networking that focuses on features instead of performance while upstart competitors are waiting in the wings? Sounds like a familiar song doesn't it? This entire episode reminds me of a story I read in the New York Times a few years ago titled The Wallflower at the Web Party which contains the following familiar sounding excerpts

But the board also lost sight of the task at hand, according to Kent Lindstrom, an early investor in Friendster and one of its first employees. As Friendster became more popular, its overwhelmed Web site became slower. Things would become so bad that a Friendster Web page took as long as 40 seconds to download. Yet, from where Mr. Lindstrom sat, technical difficulties proved too pedestrian for a board of this pedigree. The performance problems would come up, but the board devoted most of its time to talking about potential competitors and new features, such as the possibility of adding Internet phone services, or so-called voice over Internet protocol, or VoIP, to the site.
...
In retrospect, Mr. Lindstrom said, the company needed to devote all of its resources to fixing its technological problems. But such are the appetites of companies fixated on growing into multibillion-dollar behemoths. They seek to run even before they can walk.

“Friendster was so focused on becoming the next Google,” Professor Piskorski said, “that they weren’t focused on fixing the more mundane problems standing in the way of them becoming the next Google.”
...
“We completely failed to execute,” Mr. Doerr said. “Everything boiled down to our inability to improve performance.”

People said about Friendster the same thing they say about Twitter, we travel in tribes - people won't switch to Pownce or Jaiku because all their friends use Twitter. Well Friendster thought the same thing until MySpace showed up and now we have Facebook doing the same to them.

It is a very vulnerable time for Twitter and a savvy competitor could take advantage of that by adding a few features while courting the right set of influential users to jumpstart an exodus. The folks at FriendFeed could be that competitor but I suspect they won't. The Bret & Paul have already boxed their service into being an early adopter's play thing when there's actually interesting mainstream potential for their service. They'd totally put paid to their dreams of being a household brand if they end up simply being a Twitter knock off even if they could end up outplaying Evan and Biz at the game they invented.

Now Playing: Bob Marley - Redemption Song


 

The Live Search team has a blog post entitled Wikipedia Gets Big which reveals

Check it out:

Image of Live Search Wikipedia entry

We realize that often you just need to get a sense of what your query is about. Wikipedia is great for that — you can learn enough from the first paragraph of a Wikipedia article to start you out on the right path.

For Wikipedia results, we now show a good portion of the first paragraph and a few links from the table of contents. You can see more about the topic right there and see what else the article offers.

We hope you learn more, faster with our expanded Wikipedia descriptions. Let us know what you think.

After trying out on a few queries like "rain slick precipice", "wireshark" and "jeremy bentham" I definitely see this as a nice addition to the repertoire of features search engines use to give the right answer directly in the search results page. I've already found this to be an improvement compared to Google's habit of linking to definitions on Answer.com.

The interesting thing to note is just how often Wikipedia actually shows up in the top tier of search results for a diverse set of query terms. If you think this feature has legs why not leave a comment on the Live Search team's blog telling them what you think about it?

Now Playing: Abba - The Winner Takes It All


 

Categories: MSN

I've been having problems with hard drive space for years. For some reason, I couldn't get over the feeling that I had less available space on my hard drive than I could account for. I'd run programs like FolderSizes and after doing some back of the envelope calculations it would seem like I should have gigabytes more free space than what was actually listed as available according to my operating system.

Recently I stumbled on a blog post by Darien Nagle which claimed to answer the question Where's my hard disk space gone? with the recommendation that his readers should try WinDirStat. Seeing nothing to lose I gave it a shot and I definitely came away satisfied. After a quick install, it didn't take long for the application to track down where all those gigabytes of storage I couldn't account for had gone. It seems there was a hidden folder named C:\RECYCLER that was taking up 4 GigaBytes of space.

I thought that was kind of weird so I looked up the folder name and found Microsoft KB 229041 - Files Are Not Deleted From Recycler Folder which listed the following symptoms

SYMPTOMS
When you empty the Recycle Bin in Windows, the files may not be deleted from your hard disk.

NOTE: You cannot view these files using Windows Explorer, My Computer, or the Recycle Bin.

I didn't even have to go through the complicated procedure in the KB article to delete the files, I just deleted them directly from the WinDirStat interface.

My only theory as to how this happened is that some data got orphaned when I upgraded my desktop from Windows XP to Windows 2003 since the user accounts that created them were lost. I guess simply deleting the files from Windows Explorer as I did a few years ago wasn't enough.

Good thing I finally found a solution. I definitely recommend WinDirStat, the visualizations aren't half bad either.

Now Playing: Eminem - Never Enough (feat. 50 Cent & Nate Dogg)


 

Categories: Technology

June 1, 2008
@ 01:46 PM

A coworker forwarded me a story from a Nigerian newspaper about a cat turning into a woman in Port Harcourt, Nigeria. The story is excerpted below

This woman was reported to have earlier been seen as a cat before she reportedly turned into a woman in Port Harcourt, Rivers State, on Thursday. Photo: Bolaji Ogundele. WHAT could be described as a fairy tale turned real on Wednesday in Port Harcourt, Rivers State, as a cat allegedly turned into a middle-aged woman after being hit by a commercial motorcycle (Okada) on Aba/Port Harcourt Expressway.

Nigerian Tribune learnt that three cats were crossing the busy road when the okada ran over one of them which immediately turned into a woman. This strange occurrence quickly attracted people around who descended on the animals. One of them, it was learnt, was able to escape while the third one was beaten to death, still as a cat though.

Another witness, who gave his name as James, said the woman started faking when she saw that many people were gathering around her. “I have never seen anything like this in my life. I saw a woman lying on the road instead of a cat. Blood did not come out of her body at that time. When people gathered and started asking her questions, she pretended that she did not know what had happened," he said.

Reading this reminds me how commonplace it was to read about the kind of mind boggling supernatural stories that you'd expect to see in the Weekly World News in regular newspapers alongside sports, political and stock market news in Nigeria.  Unlike the stories of alien abduction you find in the U.S., the newspaper stories of supernatural events often had witnesses and signed confessions from the alleged perpetrators of supernatural acts. Nobody doubted these stories, everyone knew they were true. Witches who would confess to being behind the run of bad luck of their friends & family or who'd confess that they key to their riches was offering their family members or children as blood sacrifices to ancient gods. It was all stuff I read in the daily papers as a kid as I would be flipping through looking for the comics. 

The current issue of Harper's Bazaar talks about the penis snatching hysteria from my adolescent years. The story is summarized in Slate magazine shown below

Harper's, June 2008
An essay reflects on the widespread reports of "magical penis loss" in Nigeria and Benin, in which sufferers claim their genitals were snatched or shrunken by thieves. Crowds have lynched accused penis thieves in the street. During one 1990 outbreak, "[m]en could be seen in the streets of Lagos holding on to their genitalia either openly or discreetly with their hand in their pockets." Social scientists have yet to identify what causes this mass fear but suspect it is what is referred to as a "culture-bound syndrome," a catchall term for a psychological affliction that affects people within certain ethnic groups.

I remember that time fairly well. I can understand that this sounds like the kind of boogie man stories that fill every culture. In rural America, it is aliens in flying saucers kidnapping people for anal probes and mutilating cows. In Japan, it's the shape changing foxes (Kitsune). In Nigeria, we had witches who snatched penises and could change shape at will.

In the cold light of day it sounds like mass hysteria but I wonder which is easier to believe sometimes. That a bunch of strangers on the street had a mass hallucination that a cat transformed into a woman or that there really are supernatural things beyond modern science's understanding out there? 

Now Playing: Dr. Dre - Natural Born Killaz (feat. Ice Cube)