Puppet provides a mechanism for managing a heterogeneous cluster of Unix-like machines using a central configuration system and a declarative scripting language for describing machine configuration. The declarative scripting language abstracts away the many differences in various Unix-like operating systems.

Puppet is used for server management by a number of startups including PowerSet, Joost and Slide.

Typical Puppet Architecture

In a system managed using Puppet, Pupper Master is the central system-wide authority for configuration and coordination. Manifests are propagated to the Puppet Master from a source external to the system. Each server in the system periodically polls the Puppet Master to determine if their configuration is up to date. If this is not the case, then the new configuration is retrieved and the changes described by the new manifest are applied. The Puppet instance running on each client can be considered to be made up of the following layers

 

Each manifest is described in the Puppet Configuration Language which is a high level language for describing resources on a server and what actions to take on them. Retrieving the newest manifests and applying the changes they describe (if any) is provided by the Transactional Layer of Puppet. The Puppet Configuration Language is actually an abstraction that hides the differences in various Unix-like operating systems. This abstraction layer maps the various higher level resources in a manifest to the actual commands and file locations on the target operating systems of the server.

What Does Puppet Do?

The Puppet Master is the set of one or more servers that run the puppetmasterd daemon. This daemon listens for polling requests from the servers being managed by Puppet and returns the current configuration for the server to the machine. Each server to be managed by Puppet, must have the Puppet client installed and must run the puppetd daemon which polls the Puppet Master for configuration information.

Each manifest can be thought of as a declarative script which contains one or more commands (called resources in Puppet parlance) and their parameters, dependencies along with the prerequisites to running each command. Collections of resources can be grouped together as classes (complete with inheritance) which can be further grouped together as modules. See below for examples

Language Construct Example Description
Resource
service { "apache": require => Package["httpd"] }
The apache resource requires that the httpd package is installed
Class
class apache {
    service { "apache": require => Package["httpd"] }
file { "/nfs/configs/apache/server1.conf":
        group  => "www-data
     }

}
Groups together the rule that the apache service requires the httpd package to be installed and that the server1.conf apache configuration file should be owned by the www-data group.
Derived Class
class apache-ssl inherits apache {
    Service[apache] { require +> File["apache.pem"] }
}
The apache-ssl class defines all of the above and that additionally, the apache service also requires the existence of the apache.pem configuration file.
Module
class webserver::apache-ssl inherits apache {
    Service[apache] { require +> File["apache.pem"] }
}
The apache-ssl class is part of the webserver module.
Node
node "webserver.example.com" {
include webserver
}
Declares that the manifest for the machine named webserver.example.com is the webserver module.

A node describes the configuration for a particular machine given its name. Node names and their accompanying configuration can be defined directly in manifests as shown above. Another option is to either use external node classifiers which provide a dynamic mechanism for determine a machine's type based on it's name or use an LDAP directory for storing information about nodes in the cluster.

FURTHER READING

  • Puppet Type Reference – List of built in resource types abstracted by the Puppet configuration language.
  • Puppet Language Tutorial -  Introduction to the various language constructs in the puppet language including classes, modules, conditionals, variables, arrays and functions.

Now Playing: Linkin Park - Cure For The Itch


 

Categories: Web Development

Whenever you read stories about how Web companies like Facebook have 10,000 servers including 1800 database servers or that Google has one million servers, do you ever wonder how the system administrators that manage these services deal with deployment, patching, failure detection and system repair without going crazy? This post is the first in a series of posts that examines some of the technologies that successful Web companies use to manage large Web server farms.

Last year, Michael Isard of Microsoft Research wrote a paper entitled Autopilot: Automatic Data Center Management which describes the technology that Windows Live and Live Search services have used to manage their server farms. The abstract of his paper is as follows

Microsoft is rapidly increasing the number of large-scale web services that it operates. Services such as Windows Live Search and Windows Live Mail operate from data centers that contain tens or hundreds of thousands of computers, and it is essential that these data centers function reliably with minimal human intervention. This paper describes the first version of Autopilot, the automatic data center management infrastructure developed within Microsoft over the last few years. Autopilot is responsible for automating software provisioning and deployment; system monitoring; and carrying out repair actions to deal with faulty software and hardware. A key assumption underlying Autopilot is that the services built on it must be designed to be manageable. We also therefore outline the best practices adopted by applications that run on Autopilot.

The paper provides a high level overview of the system, it's design principles and the requirements for applications/services that can be managed by the system. It gives a lot of insight into what it takes to manage a large server farm while keeping management and personnel costs low.

The purpose of AutoPilot is to automate and simplify the basic tasks that system administrators typically perform in a data center. This includes installation and deployment of software (including operating systems and patches), monitoring the health of the system, taking basic repair actions and marking systems as needing physical repair or replacement.

However applications that will be managed by AutoPilot also have their responsibilities. The primary responsibility of these applications include being extremely fault tolerant (i.e. applications must be able to handle processes being killed without warning) and being capable of running in the case of large outages in the cloud (i.e. up to 50% of the servers being out of service). In addition, these applications need to be easy to install and configure which means that they need to be xcopy deployable. Finally, the application developers are responsible for describing which application specific error detection heuristics AutoPilot should use when monitoring their service.

Typical AutoPilot Architecture

 

The above drawing is taken from the research paper. According to the paper the tasks of the various components is as follows

The Device Manager is the central system-wide authority for configuration and coordination. The Provisioning Service and Deployment Service ensure that each computer is running the correct operating system image and set of application processes. The Watchdog Service and Repair Service cooperate with the application and the Device Manager to detect and recover from software and hardware failures. The Collection Service and Cockpit passively gather information about the running components and make it available in real-time for monitoring the health of the service, as well as recording statistics for off-line analysis. (These monitoring components are ―Autopiloted like any other application, and therefore communicate with the Device Manager and Watchdog Service which provide fault recovery, deployment assistance, etc., but this communication is not shown in the figure for simplicity.)

The typical functioning of the system is described in the following section.

What Does AutoPilot Do?

The set of machine types used by the application (e.g. Web crawler, front end Web server, etc) needs to be defined in a database stored by on the Device Manager. A server's machine type dictates what configuration files and application binaries need to be installed on the server. This list is manually defined by the system administrators for the application. The Device Manager also keeps track of the current state of the cluster including what various machine types are online and their health status.

The Provisioning Service continually scans the network looking for new servers that have come online. When a new member of the server cluster is detected, the Provisioning Service asks the Device Manager what operating system image it should be running and then images the machine with a new operating system before performing burn-in tests. If the tests are successful, the Provisioning Service informs the Device Manager that the server is healthy. In addition to operating system components, some AutoPilot specific services are also installed on the new server. There is a dedicated filesync service that ensures that the correct files are present on the computer and an application manager that ensures that the expected application binaries are running.

Both services determine what the right state of the machine should be by querying the Device Manager. If it is determined that the required application binaries and files are not present on the machine then they are retrieved from the Deployment Service. The Deployment Service is a host to the various application manifests which map to the various application folders, binaries and data files. These manifests are populated from the application's build system which is outside the AutoPilot system.

The Deployment Service also comes into play when a new version of the application is ready to be deployed. During this process a new manifest is loaded into the Deployment Service and the Device Manager informs the various machine types of the availability of the new application bits. Each machine type has a notion of an active manifest which allows application bits for a new version of the application to be deployed as an inactive manifest while the old version of the application is considered to be "active". The new version of the application is rolled out in chunks called "scale units". A scale unit is a group of multiple machine types which can number up to 500 machines. Partitioning the cluster into scale units allows code roll outs to be staged. For example, if you have a cluster of 10,000 machines with scale units of 500 machines, then AutoPilot could be configured keep roll outs to under 5 scale units at a time so that never more than 25% of the cloud is being upgraded at a time.

Besides operating system installation and deployment of application components, AutoPilot is also capable of monitoring the health of the service and taking certain repair actions. The Watchdog Service is responsible for detecting failures in the system. It does so by probing each of the servers in the cluster and testing various properties of the machines and the application(s) running on them based on various predetermined criteria. Each watchdog can report one of three results for a test; OK, Warning or Error. A Warning does not initiate any repair action and simply indicates a non-fatal error has occurred. When a watchdog reports an error back to the Device Manager, the machine is placed in the Failure state and one of the following repair actions is taken; DoNothing, Reboot, ReImage or Replace. The choice of repair action depends on the failure history of the machine. If this is the first error that has been reported on the machine in multiple days or weeks then it is assumed to be a transient error and the appropriate action is DoNothing. If not, the machine is rebooted and if after numerous reboots the system is still detected to be in error by the watchdogs it is re-imaged (a process which includes reformatting the hard drive and reinstalling the operating system as well redeploying application bits). If none of these solve the problem then the machine is marked for replacement and it is picked up by a data center technician during weekly or biweekly sweeps to remove dead servers.

System administrators can also directly monitor the system using data aggregated by the Collection Service which collects information from various performance counters is written to a large-scale distributed file store for offline data mining and to a SQL database where the data can be visualized as graphs and reports in a visualization tool known as the Cockpit

Now Playing: Nirvana - Jesus Doesn't Want Me For A Sunbeam


 

One of the valuable ideas from Frederick Brooks' classic The Mythical Man Month is the notion of the second system effect when designing software systems. The following is an excerpt from the book where the concept is introduced

An architect’s first work is apt to be spare and clean. He knows he doesn’t know what he’s doing, so he does it carefully and with great restraint.

As he designs the first work, frill after frill and embellishment after embellishment occur to him. These get stored away to be used “next time.” Sooner or later the first system is finished, and the architect, with firm confidence and a demonstrated mastery of that class of systems, is ready to build a second system.

This second is the most dangerous system a man ever designs. When he does his third and later ones, his prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify those parts of his experience that are particular and not generalizable.

The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one.

In my experience, the second system effect doesn't just come into play when the creator(s) of the second system also worked on the first system. In many cases, a development team may be brought in to build a new version of a first system either because they are competitors trying to one up a successful product or they are building the next generation of the technology while the original team goes into "maintenance mode" (i.e. shades of The Soul of a New Machine). In such situations, the builders of the second system can fall into the trap that Frederick Brooks describes in chapter 5 of Mythical Man Month.

At almost any point in time over the past few years, I could easily count about three or four software projects I was personally familiar with that were making classic "second system" mistakes. However instead of vilifying these projects, I thought it would be useful to list the top three things I've seen that have separated second systems that have avoided falling into this trap and those that haven't.

Realize You Can't Do It All In One Release

A lot of projects lose their way because they try do a too much in a single release. As Raymond Chen wrote You don't know what you do until you know what you don't do. Until you start focusing on a key set of scenarios your product will nail and start cutting features that aren't necessary to hitting those scenarios, your project isn't ready for prime time. What developers often fail to remember is that there is always another version and you can fit those scenarios in at that point.

One company that gets this idea very well is Apple. I can still remember Cmdr Taco's infamous review of the original iPod when it ran in 2001; "No wireless. Less space than a nomad. Lame." However the iPod nailed its key scenarios and with that success kept expanding its set of key scenarios (more space, video, photos, etc) until it became the cultural juggernaut it is today. You can see the same qualities in the iPhone. Just a few months ago, you'd read articles like Final report: The iPhone is not open for business that argued against the iPhone because it didn't support 3G, lacked Exchange support and had a non-existent developer platform. However the original iPhone was still successful and they addressed these issues in the next version to even greater success.

You Can be Date Driven or Feature Driven but not Both

A date driven release is one where everyone on the team is working to hit a particular time cut off after which the product will ship with or without their feature. Software products that have to hit the back to school cycle, tax time or the holiday shopping cycle are often date driven. A feature driven software release takes the "we won't ship it until it is ready approach" which is popular among Open Source projects and at companies like Google (according to Steve Yegge).

The thing to note about both approaches is that they are built on compromise. In that, we will compromise on our ship date but not on our features or vice versa. Where software projects tend to go awry is when they decide to be both feature driven and date driven because it means they have left no room to compromise.  This is additionally problematic because we are so poor at project estimation in our industry. So at the start of a project you have features that should take a two years to ship only budgeted as needing a year of work. In a date driven release, once this discrepancy is realized it is at that point features start getting cut or "placed below the cut line". In a feature driven release, the ship date is adjusted and ship expectations adjusted.

Projects that are both feature driven and date driven (i.e. we have to ship features X, Y & Z on date A) end up delaying these decisions until the last minute since they aren't mentally setup to compromise on either the date or the features. So they end up doing neither until the very last minute. This leads to missed deadlines, hastily cut features and demoralization within the product team. This often continues for multiple deadlines until finally the project team gets to the point where they feel they must show something for all the missed deadlines and cut features by throwing together a mediocre release after the one too many missed deadlines. We've all seen software projects that have succumbed to this and it is a sad sight to behold.

Don't Lose Track of What Made the First System Successful

Developers tend to be a fairly critical lot so when they look at a successful "first system", they often only see the flaws. This is often what fuels the second system effect and leads to losing sight of why the first system became a hit in the first place. A recent example of this is the search engine Cuil which was started by some former employees of Google with the intent of building a search engine which fixes the issues with Google's search engine. Unfortunately, they had a disastrous product launch which has been documented in blog posts like How To Lose Your Cuil 20 Seconds After Launch and news articles such as Cuil shows us how not to launch a search engine.

When you look back at the PR buildup leading to Cuil's launch, it is interesting to note that even though the Cuil team dubbed themselves Google slayers they did not address the key things people like about Google's search. Google's search provides relevant search results as quickly as possible. Cuil bragged about providing more complete results because their search index was bigger, showing more results above the fold by going with three columns in their search engine results page and that it offered richer query refinement features than Google. Although all of these are weaknesses in Google's user experience they are trumped by the fact that Google provides extremely relevant search results.  The Cuil team lost sight of this, probably because working at Google they only ever talked about fixing the flaws in the search product instead of also internalizing what has made it so successful.

This is an extremely common mistake that cuts across all categories of software products.

Now Playing: Young Jeezy - Motivation


 

Categories: Programming

Earlier today I read two posts that were practically mirror opposites of each other. The first was Paul Graham's essay, The Pooled-Risk Company Management Company, where he makes a case against founding a company that you intend to run as a successful business over the long term and instead building a company you that you can sell to a public company so you can move on and enjoy your money. His argument is excerpted below

At this year's startup school, David Heinemeier Hansson gave a talk in which he suggested that startup founders should do things the old fashioned way. Instead of hoping to get rich by building a valuable company and then selling stock in a "liquidity event," founders should start companies that make money and live off the revenues. Sounds like a good plan. Let's think about the optimal way to do this.

One disadvantage of living off the revenues of your company is that you have to keep running it. And as anyone who runs their own business can tell you, that requires your complete attention. You can't just start a business and check out once things are going well, or they stop going well surprisingly fast.

The main economic motives of startup founders seem to be freedom and security. They want enough money that (a) they don't have to worry about running out of money and (b) they can spend their time how they want. Running your own business offers neither. You certainly don't have freedom: no boss is so demanding. Nor do you have security, because if you stop paying attention to the company, its revenues go away, and with them your income.

The best case, for most people, would be if you could hire someone to manage the company for you once you'd grown it to a certain size.
...
If such pooled-risk company management companies existed, signing up with one would seem the ideal plan for most people following the route David advocated. Good news: they do exist. What I've just described is an acquisition by a public company.

Austin Wiltshire has a counterpoint to Paul Graham's article entitled New hire cannon fodder where he decries the practice of "exploiting" junior developers so that a couple of fat cats with money can get even richer. Parts of Austin's counter argument are excerpted below

Why these large firms, and now even places like YCombinator continually think the best way to move forward in software is to hire as many gullible young naive programmers as possible and work them to death is beyond me.  It’s pretty well known that 80 hour work weeks and inexperience is a guarantee to continually make the same damn mistakes over and over again.  It’s also an open question as to why new hires let these companies take advantage of them so badly.  Paul Graham had a start up, he begged for angel investing, and his life should show you - what does he do now?  Well he learned from his experience that designing and building is for chumps, to make the big bucks and sit on your ass you become an angel investor.

Kids will work for pennies.  You can continue to fill their heads with dreams of having the next big idea, even though they are carrying all the risk for you.  Junior developers, whether entrepreneurs or otherwise, are being asked to give up their 20’s, probably the best, most energetic years of their lives, to have a chance at making a dent in someone else’s bottom line.  (Make note, the one exception here I’ve seen is 37 Signals :) )

But have we never stopped to think who truly is benefiting from all these hours?  Do we get paid more?  No.  In fact, because many of us are salaried, we’re effectively paid less.  Are we compensated with faster promotions?  Possibly - but don’t forget about that silicon ceiling.  The only person who knows how many hours you’re putting in is probably just the guy above you - but he makes sure to show just how productive his department is (via your hard work) to everyone.  He will always get the spoils.  Who will end up really getting the spoils out of any of YCombinator’s work?  Paul Graham.

Both arguments have their merits and there are also parts I disagree with on both sides. Austin is right that YCombinator takes advantage of the naivety of youth. However when you are in your 20s with no serious attachments (like a mortgage, a family or even a sick relative) it doesn't sound like a bad idea to make hay while the sun shines. If you can sacrifice some time in your youth for a chance at a better life for yourself and your future spouse/kids/girlfriend/family/etc in a few years, is it wrong to treat that as an opportunity? Especially if you'll be working in an energetic environment surrounded by likeminded souls all believing that you are building cool stuff? Additionally, if the startup doesn't work out [which it most likely won't] the experience will still turn out to be useful when you decide to get a regular job at some BigCo even if it is just realizing how good you have it to no longer have to work 80 hour weeks while eating Top Ramen for breakfast and dinner any more.

From that perspective I don't think Austin is right to completely rail against the startup lifestyle. However I totally agree with the general theme of Austin's post that working ridiculous hours is dumb. It should be common knowledge that sleep deprivation impairs brain function and may even lead to psychiatric disorders. The code you check-in during your 14th hour at your desk will not be as good as what you checked in during your 4th. If you really have to work that much, work on the weekends instead of spending over 12 hours sitting in front of your IDE. Even then, busting your butt to that extent only makes sense if you not only get to share the risks of failure but also the rewards of success as well. This means you better be a co-founder or someone with equity and not just some poor sap on salary.

My issue with Paul Graham's essay and the investment style of YCombinator is that I it sells startup founders short. Paul recently wrote an essay entitled Cities and Ambition where he had this beautiful quote about the kind of "peer pressure" the Silicon Valley area exerts on startup founders

When you ask what message a city sends, you sometimes get surprising answers. As much as they respect brains in Silicon Valley, the message the Valley sends is: you should be more powerful.

That's not quite the same message New York sends. Power matters in New York too of course, but New York is pretty impressed by a billion dollars even if you merely inherited it. In Silicon Valley no one would care except a few real estate agents. What matters in Silicon Valley is how much effect you have on the world. The reason people there care about Larry and Sergey is not their wealth but the fact that they control Google, which affects practically everyone.

Read the above quote again and let its message sink in. The great thing about software is how you can literally take nothing (i.e. a blank computer screen) and build something that changes the world. Bill and Paul did it with Microsoft. Larry and Sergey have done it with Google. Jerry and David did it with Yahoo!, and some might say Mark Zuckerberg is doing it with Facebook.

Are any of those companies YCombinator-style, built-to-flip companies? Nope.

I strongly believe in the idea behind the mantra "Change the World or Go Home". Unlike anything that has come before it, the combination of software and the World Wide Web has the potential to connect people and empower them in more ways than humanity has never seen. And it is possible to become immensely rich while moving humanity forward with the software that you create.

So if you have decided to found a startup, why decide to spend your youth building some "me too" application that conforms to all the current "Web 2.0" fads in the desperate hope that you can convince some BigCo to buy you out? That sounds like such a waste. Change the world or go home.

Now Playing: Wu-Tang Clan - Triumph


 

It's been just over a month since we released the alpha of the next release of RSS Bandit codenamed Phoenix. Below are a couple of posts about the alpha from some popular blogs

The tone of the feedback was generally the same. People were very interested in the synchronization with Google Reader but were dissatisfied due to bugs or performance issues. I hadn't expected so much interest in an alpha release otherwise we would have been more diligent about bug fixing and performance improvements. Anyway, there was a ton of great feedback and we fixed a bunch of bugs including the following issues

  • Application hangs on shutdown due to search indexing [bug 1967898]
  • Feeds in tree view not sorted in alphabetical order [bug 1999533]
  • Mouse wheel doesn't work when attempting to scroll feed list [bug 1999534]
  • Removing a synchronized feed source deletes all items across all feed sources [bug 1999800
  • Exception when loading feed list from NewsGator Online [bug 2000390]
  • Feed logos are broken image links in feeds synchronized from NewsGator Online [bug 2000764]
  • Context menu for a feed source doesn't contain the option to remove the feed source [bug 2000808]
  • Exception when loading feed list from Google Reader [bug 20001419]
  • Space not accepted in Name field of Synchronize Feeds dialog [bug 20001908]
  • Google Reader synchronizes feeds but not items within feeds [bug 2001911]
  • Favorite icon not downloaded for Google Reader feeds [bug 2001915]
  • Selecting Unread Items search folder displays error message [bug 2001916]
  • Incorrect NewsGator or Google Reader password cannot be changed [2002144]
  • Crash on attempting to download an enclosure [bug 2004646]
  • Google Reader password communicated over the wire in plain text [bug 2005154]
  • Option to take over network settings from Internet Explorer does not allow specifying proxy server password [bug 2005687]
  • Unable to play a downloaded video from the Download Manager [bug 2005854]
  • Media keeps playing after closing a browser tab [bug 2014408]
  • Custom column layouts not remembered for specific feeds [bug 2022242]
  • Category dropdown in Add Subscription Wizard doesn't match selected feed source [bug 2026658]

There were also a couple of performance and memory usage improvements that were made along the way. I still have one or two issues that I've been having problems reproducing such as one situation where we end up getting a streak of timeouts while waiting for HTTP responses from Google Reader. I suspect it has to do with making lots of requests to a single domain in rapid succession if you're behind a proxy server but I might be wrong. Despite that one issue, the application is now a lot more usable and is feature complete for the release. If you hit that issue, you can either wait a couple of minutes before retrying to refresh feeds or restart the application to clear it up.

You can download the beta version of Phoenix from RssBandit.Phoenix.Beta.Installer.zip. There are two files in the installer package, I suggest running setup.exe because that validates that you have the correct prerequisites to run the application and tells you where to get them otherwise.

If you have any problems feel free to file a bug on SourceForge or ask a question on our forum. Thanks for using our software.

PS: As RSS Bandit is a hobbyist application worked on in our free time, we rely on the generosity of our users when it comes to providing translations of our application. If you look at the supported language matrix for RSS Bandit, you'll see the languages in which the application has been translated in previous versions. We would love to get translators for those languages again and for any new languages as well.  If you'd like to provide your skills as a translator to the next release of RSS Bandit and believe you can get this done in this next month or two then please send mail to . We'd appreciate your help.

Now Playing: Lil Wayne - Best Rapper Alive


 

Categories: RSS Bandit

July 28, 2008
@ 02:27 AM

For several months Nick Carr has pointed out that Wikipedia ranks highly in the search results for a number of common topics in Google's search engine. In his post entitled Googlepedia Nick Carr speculated on why Google would see this trend as a threat in a paragraph which is excerpted below

I'm guessing that serving as the front door for a vast ad-less info-moshpit outfitted with open source search tools is not exactly the future that Google has in mind for itself. Enter Knol.

Clearly Nick Carr wasn't the only one that realized that Google was slowly turning into a Wikipedia redirector. Google wants to be the #1 source for information or at least be serving ads on the #1 sites on the Internet in specific area. Wikipedia was slowly eroding the company's effectivenes at achieving both goals. So it is unsurprising that Google has launched Knol and is trying to entice authors away from Wikipedia by offering them a chance to get paid.

What is surprising is that Google is tipping it's search results to favor Knol. Or at least that is the conclusion of several search engine optimization (SEO) experts and also jibes with my experiences.

Danny Sullivan wrote in his post The Day After: Looking At How Well Knol Pages Rank On Google that

We've been assured that just because content sits on Google's Knol site, it won't gain any ranking authority from being part of the Knol domain. OK, so a day after Knol has launched, how's that holding up? I found 1/3 of the pages listed on the Knol home page that I tested ranked in the top results.

I was surprised to see a post covering how Knol's How to Backpack was already hitting the number three spot on Google. Really? I mean, how many links could this page have gotten already? As it turns out, quite a few. And more important, it's featured on the Knol home page, which itself is probably one of the most important links. While Knol uses nofollow on individual knols to prevent link credit from flowing out, it's not used on the home page -- so home page credit can flow to individual knols featured on it.

here's a test knol I made yesterday -- Firefox Plugins For SEO & SEM -- which ranks 28 for firefox plugins for seo. I never linked to it from my article about knol. I don't think it made the Knol home page. I can see only three links pointing at it, and only one of those links uses anchor text relevant to what the page is ranking for. And it's in the top 30 results?

Look, I know that being ranked 28 is pretty much near invisible in terms of traffic you'll get from search engines. But then again, to go from nowhere to the 28th top page in Google out of 755,000 matches? I'm sorry -- don't tell me that being in Knol doesn't give your page some authority.

Aaron Wall noticed something even more insidious in his post entitled Google Knol - Google's Latest Attack on Copyright where he notices that if Google notices duplicate content then it favors the content on Knol over a site that has existed for years and has decent PageRank. His post is excerpted below

Another Knol Test

Maybe we are being a bit biased and/or are rushing to judgement? Maybe a more scientific effort would compare how Knol content ranks to other content when it is essentially duplicate content? I did not want to mention that I was testing that when I created my SEO Basics Knol, but the content was essentially a duplicate of my Work.com Guide to Learning SEO (that was also syndicated to Business.com). Even Google shows this directly on the Knol page

Google Knows its Duplicate Content

Is Google the Most Authoritative Publisher?

Given that Google knows that Business.com is a many year old high authority directory and that the Business.com page with my content on it is a PageRank 5, which does Google prefer to rank? Searching for a string of text on the page I found that the Knol page ranks in the search results.

If I override some of Google's duplicate content filters (by adding &filter=0 to the search string) then I see that 2 copies of the Knol page outrank the Business.com page that was filtered out earlier.

Interesting.

Following Danny's example, I also tried running some searches for terms that appear on the Knol homepage and seeing how they did in Google's search. Here's the screenshot of the results of searching for "buttermilk pancakes"

Not bad for a page that has existed on the Web for less than two weeks.

Google is clearly favoring Knol content over content from older, more highly linked sites on the Web. I won't bother with the question of whether Google is doing this on purpose or whether this is some innocent mistake. The important question is "What are they going to do about it now that we've found out?"

Now Playing: One Republic - Stop and Stare
 

There was an interesting presentation at OSCON 2008 by Evan Henshaw-Plath and Kellan Elliott-McCrea entitled Beyond REST? Building Data Services with XMPP PubSub. The presentation is embedded below.

The core argument behind the presentation can be summarized by this tweet from Tim O'Reilly

On monday friendfeed polled flickr nearly 3 million times for 45000 users, only 6K of whom were logged in. Architectural mismatch. #oscon08

On July 21st, FriendFeed had 45,000 users who had associated their Flickr profiles with their FriendFeed account. FriendFeed polls Flickr about once every 20 – 30 minutes to see if the user has uploaded new pictures. However only about 6,000 of those users logged into Flickr that day, let alone uploaded pictures. Thus there were literally millions of HTTP requests made by FriendFeed that were totally unnecessary.

Evan and Kellan's talk suggests that instead of Flickr getting almost 3 million requests from FriendFeed, it would be a more efficient model for FriendFeed to tell Flickr which users they are interested in and then listen for updates from Flickr when they upload photos.

They are right. The interaction between Flickr and FriendFeed should actually be a publish-subscribe relationship instead of a polling relationship. Polling is a good idea for RSS/Atom for a few reasons

  • there are a thousands to hundreds of thousands clients that might be interested in a resource so the server keeping track of subscriptions is prohibitively expensive
  • a lot of these end points aren't persistently connected (i.e. your desktop RSS reader isn't always running)
  • RSS/Atom publishing is as simple as plopping a file in the right directory and letting IIS or Apache work its magic

The situation between FriendFeed and Flickr is almost the exact opposite. Instead of thousands of clients interested in document, we have one subscriber interested in thousands of documents. Both end points are always on or are at least expected to be. The cost of developing a publish-subscribe model is one that both sides can afford.

Thus this isn't a case of REST not scaling as implied by Evan and Kellan's talk. This is a case of using the wrong tool to solve your problem because it happens to work well in a different scenario. The above talk suggests using XMPP which is an instant messaging protocol as the publish-subscribe mechanism. In response to the talk, Joshua Schachter (founder of del.icio.us) suggested a less heavyweight publish-subscribe mechanism using a custom API in his post entitled beyond REST. My suggestion for people who believe they have this problem would be to look at using some subset of XMPP and experimenting with off-the-shelf tools before rolling your own solution. Of course, this is an approach that totally depends on network effects. Today everyone has RSS/Atom feeds while very few services use XMPP. There isn't much point in investing in publishing as XMPP if your key subscribers can't consume it and vice versa. It will be interesting to see if the popular "Web 2.0" companies can lead the way in using XMPP for publish-subscribe of activity streams from social networks in the same way they kicked off our love affair with RESTful Web APIs.

It should be noted that there are already some "Web 2.0" companies using XMPP as a way to provide a stream of updates to subscribing services to prevent the overload that comes from polling. For example, Twitter has confirmed that it provides an XMPP stream to FriendFeed, Summize, Zappos, Twittervision and Gnip. However they simply dump out every update that occurs on Twitter to these services instead of having these services subscribe to updates for specific users. This approach is quite inefficient and brings it's own set of scaling issues.

The interesting question is why people are just bringing this up? Shouldn't people have already been complaining about Web-based feed readers like Google Reader and Bloglines for causing the same kinds of problems? I can only imagine how many millions of times a day Google Reader must fetch content from TypePad and Wordpress.com but I haven't seen explicit complaints about this issue from folks like Anil Dash or Matt Mullenweg.

Now Playing: The Pussycat Dolls - When I Grow Up


 

Disclaimer: This post does not reflect the opinions, thoughts, strategies or future intentions of my employer. These are solely my personal opinions. If you are seeking official position statements from Microsoft, please go here.

Earlier this week, David Recordon announced the creation of the Open Web Foundation at OSCON 2008. His presentation is embedded below

From the organization's Web site you get the following outline of it's mission

The Open Web Foundation is an attempt to create a home for community-driven specifications. Following the open source model similar to the Apache Software Foundation, the foundation is aimed at building a lightweight framework to help communities deal with the legal requirements necessary to create successful and widely adopted specification.

The foundation is trying to break the trend of creating separate foundations for each specification, coming out of the realization that we could come together and generalize our efforts. The details regarding membership, governance, sponsorship, and intellectual property rights will be posted for public review and feedback in the following weeks.

Before you point out that this seems to create yet another "standards" organization for Web technology, there are already canned answers to this question. Google evangelist Dion Almaer provides justification for why existing Web standards organizations do not meet their needs in his post entitled The Open Web Foundation; Apache for the other stuff where he writes 

Let’s take an example. Imagine that you came up with a great idea, something like OAuth. That great idea gains some traction and more people want to get involved. What do you do? People ask about IP policy, and governance, and suddenly you see yourself on the path of creating a new MyApiFoundation.

Wait a minute! There are plenty of standards groups and other organizations out there, surely you don’t have to create MyApiFoundation?

Well, there is the W3C and OASIS, which are pay to play orgs. They have their place, but MyApi may not fit in there. The WHATWG has come up with fantastic work, but the punting on IP is an issue too.

At face value, it's hard to argue with this logic. The W3C charges fees using a weird progressive taxation model where a company pays anything from a few hundred to several thousand dollars depending on how the W3C assesses their net worth. OASIS similarly charges from $1,000 to $50,000 depending on how much influence the member company wants to have in the organization. After that it seems there are a bunch of one off organizations like the Open ID foundation and the WHATWG that are dedicated to a specific technology. 

Or so the spin from the Open Web Foundation would have you believe.

In truth there is already an organization dedicated to producing "Open" Web technologies that has a well thought out policy on membership, governance, sponsorship and intellectual property rights that isn't pay to play. This is not a new organization, it actually happens to be older than David Recordon who unveiled the Open Web Foundation.

The name of this organization is the Internet Engineering Task Force (IETF). If you are reading this blog post then you are using technologies for the "Open Web" created by the IETF. You may be reading my post in a Web browser in which case the content was transferred to you over HTTP (RFC 2616) and if you're reading it in an RSS reader then I should add that you're also directly consuming my Atom feed (RFC 4287). Some of you are reading this post because someone sent you an email which is another example of an IETF protocol at work, SMTP (RFC 2821).

The IETF policy on membership doesn't get more straightforward; join a mailing list. I am listed as a member of the Atom working group in RFC 4287 because I was a participant in the atom-syntax mailing list. The organization has a well thought out and detailed policy on intellectual property rights as it relates the IETF specifications which is detailed in RFC 3979: Intellectual Property Rights in IETF Technology and slightly updated in RFC 4879: Clarification of the Third Party Disclosure Procedure in RFC 3979.

I can understand that a bunch of kids fresh out of college are ignorant of the IETF and believe they have to reinvent the wheel to Save the Open Web but I am surprised that Google which has had several of it's employees participate in the IETF processes which created RFC 4287, RFC 4959, RFC 5023 and RFC 5034 would join in this behavior. Why would Google decide to sponsor a separate standards organization that competes with the IETF that has less inclusive processes than the IETF, no clear idea of how corporate sponsorship will work and a yet to be determined IPR policy?

That's just fucking weird.

Now Playing: Boyz N Da Hood - Dem Boys (remix) (feat T.I. & The Game)


 

Categories: Technology

I've been using the redesigned Facebook profile and homepage for the past few days and thought it would be useful to write up my impressions on the changes. Facebook is now the the world's most popular social networking site and one of the ways they've gotten there is by being very focused on listening to their users and improving their user experienced based on this feedback. Below are screenshots of the old and new versions of the pages and a discussion of which elements are changed and the user scenarios the changes are meant to improve.

Homepage Redesign

OLD HOME PAGE:

NEW HOME PAGE:

The key changes and their likely justifications are as follows

  • Entry points for creating content are now at the top of the news feed. One of the key features driving user engagement on Facebook is the News Feed. This lets a user know what is going on with their social network as soon as they logon to the site. In a typical example of network effects at work, one person creates some content by uploading a photo or sharing a link and hundreds of people on their friend list benefit by having content to view in their News Feed. If any of the friends responds to the content this again benefits hundreds of people and so on.  The problem with the old home page was that a user sees their friends uploading photos and sharing links and may want to do so as well but there is no easy way for her to figure out how to do the same thing without having to go two or three clicks away from the home page. The entry points at the top of the feed will encourage more "impulse" content creation.

  • Left sidebar is gone. There were three groups of items in the left nav; a search box, the list of a user's most frequently accessed applications and an advertisement. The key problem is that the ad is in a bottom corner of the feed. This makes it easy for users to mentally segregate that part of the screen from their vision and either never look there or completely ignore it. Removing that visual ghetto and moving ads to being inline with the feed makes it more likely that users will look at the ad. Ah, but now you need more room to show the ad (all the space isn't needed for news feed stories). So the other elements of the left nave are moved, the search box to the header and the list of most accessed applications to the sidebar on the right. Now you have enough room to stretch out the News Feed's visible area and advertisers can reuse their horizontal banner ads on Facebook even though this makes the existing feed content now look awkward. This is one place where monetization trumped usability.

  • Comments now shown inline for News Feed items with comments (not visible in screen shot). This may be the feature that made Mike Arrington decide to call the new redesign the FriendFeedization of Facebook. Sites like FriendFeed have proven that showing the comments on an item in the feed inline gives users more content to view in their feeds and increases the likelihood of engagement since the user may want to join the conversation.

Profile Redesign

OLD PROFILE:

NEW PROFILE:

The key changes and their likely justifications are as follows

  • The profile now has tabbed model for navigation. This is a massive improvement for a number of reasons. The most important one is that in the old profile, there is a lot of content below the fold. My old profile page is EIGHT pages when printed as opposed to TWO pages when the new profile page is printed. Moving to a tabbed model (i) improves page load times and (ii) increases number of page views and hence ad impressions.

  • The Mini-Feed and the Wall have been merged. The intent here is to give more visibility to the Wall which in the old model was below the fold. The "guest book" or wall is an important part of the interaction model in social networking sites (see danah boyd's Friendster lost steam. Is MySpace just a fad? essay) and Facebook was de-emphasizing theirs in the old model.

  • Entry points for creating content are at the top of the profile page. Done for the same reason as on the Home page. You want to give users lots of entry points for adding content to the site so that they can kick off network effects by generating content which in turn generates tasty page views.

  • Left sidebar is gone. Again the left sidebar is gone and the advertisement is moved closer to the content, and away from the visual ghetto that is the bottom left of the screen. Search box and most accessed applications are now in the header as well. The intent here is also to improve the likelihood that users will view and react to the ads.

Now Playing: Da Back Wudz - I Don't Like The Look Of It (Oompa)


 

Yesterday Amazon's S3 service had an outage that lasted about six hours. Unsurprisingly this has led to a bunch of wailing and gnashing of teeth from the very same pundits that were hyping the service a year ago. The first person to proclaim the sky is falling is Richard MacManus in his More Amazon S3 Downtime: How Much is Too Much? who writes

Today's big news is that Amazon's S3 online storage service has experienced significant downtime. Allen Stern, who hosts his blog's images on S3, reported that the downtime lasted 3.5 over 6 hours. Startups that use S3 for their storage, such as SmugMug, have also reported problems. Back in February this same thing happened. At the time RWW feature writer Alex Iskold defended Amazon, in a must-read analysis entitled Reaching for the Sky Through The Compute Clouds. But it does make us ask questions such as: why can't we get 99% uptime? Or: isn't this what an SLA is for?

Om Malik joins in on the fun with his post S3 Outage Highlights Fragility of Web Services which contains the following

Amazon’s S3 cloud storage service went offline this morning for an extended period of time — the second big outage at the service this year. In February, Amazon suffered a major outage that knocked many of its customers offline.

It was no different this time around. I first learned about today’s outage when avatars and photos (stored on S3) used by Twinkle, a Twitter-client for iPhone, vanished.

That said, the outage shows that cloud computing still has a long road ahead when it comes to reliability. NASDAQ, Activision, Business Objects and Hasbro are some of the large companies using Amazon’s S3 Web Services. But even as cloud computing starts to gain traction with companies like these and most of our business and communication activities are shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous web.

Even though the pundits are trying to raise a stink, the people who should be most concerned about this are Amazon S3's customers. Counter to Richard MacManus's claim, not only is there a Service Level Agreement (SLA) for Amazon S3, it promises 99.9% uptime or you get a partial refund. 6 hours of downtime sounds like a lot until you realize that 99% uptime is 8 hours of downtime a month and over three and a half days of downtime a year. Amazon S3 is definitely doing a lot better than that.

The only question that matters is whether Amazon's customers can get better service elsewhere at the prices Amazon charges. If they can't, then this is an acceptable loss which is already covered by their SLA. 99.9% uptime still means over eight hours of downtime a year. And if they can, it will put competitive pressure on Amazon to do a better job of managing their network or lower their prices.

This is one place where market forces will rectify things or we will reach a healthy equilibrium. Network computing is inherently and no amount of outraged posts by pundits will ever change that. Amazon is doing a better job than most of its customers can do on their own for cheaper than they could ever do on their own. Let's not forget that in the rush to gloat about Amazon's down time.

Now Playing: 2Pac - Life Goes On


 

Categories: Web Development