A few weeks ago, I bookmarked a post from Sam Ruby entitled Collapsing the Stack where he wrote

Werner Vogels: Yep, the best way to completely automate operations is to have to developers be responsible for running the software they develop. It is painful at times, but also means considerable creativity gets applied to a very important aspect of the software stack. It also brings developers into direct contact with customers and a very effective feedback loop starts. There is no separate operations department at Amazon: you build it; you run it.

Sounds like a very good idea.

I don't see how this sounds like a good idea. This reminds me of a conversation I once had with someone at Microsoft who thought it would be a good idea to get rid of their test team and replace them all with developers once they moved to Test Driven Development. I used to be a tester when I first joined Microsoft and this seemed to me to be the kind of statement made by someone who assumed that the only thing testers do is write unit tests. Good test teams don't just write unit tests. They develop and maintain test tools. They perform system integration testing. They manage the team's test beds and test labs. They are the first line of defence when attempting to reproduce customer bug reports face before pulling in developers who may be working on your next release. All of this can be done by the development team but it means that your developers spend less time developing and more time testing. This cost will show up either as an increment in the amount of time it takes to get to market or a reduction in quality if schedules are not adjusted to account for this randomization of the development team. Eventually you'll end up recreating your test team so there are specific people responsible for test-related activities [which is why software companies have test teams in the first place].

The same reasoning applies to the argument for folding the responsibilities of your operations team into the development team's tasks. A good operations team isn't just responsible deployment/setup of applications on your servers and monitoring the health of the Web servers or SQL databases inyour web farm. A good operations team is involved in designing your hardware SKUs and understanding your service's peak capacity so as to optimize purchase decisions. A good operations team makes the decisions around your data centers from picking locations with the best power prices and ensuring that you're making optimal use of all the physical space in your data center to making build . A good operations team is the first line of defence when your service is being hit by a Denial of Service attack. A good operations team insulates the team from worrying about operating system, web server or database patches being made to the live site. A good operations team is involved in the configuration, purchase, deployment and [sometimes] development of load balancing, database partitioning and database replication tools. Again, you can have your development team do this but eventually it would seem to make sense that these tasks be owned by specific individuals instead of splitting them across the developers building one's core product.

PS: I've talked to a bunch of folks who know ex-Amazon developers and they tend to agree with my analysis above. I'd be interested in getting the perspective of ex-Amazon developers like Greg Linden on replacing your operations team with your core development staff.

PPS: Obviously this doesn't apply if you are a small 2 to 5 person startup. Everyone ends up doing everything anyway. :)


 

Wednesday, 09 August 2006 07:24:15 (GMT Daylight Time, UTC+01:00)
Your "PPS" is the key point: keeping things small and having your developers be responsible for their stuff go hand in hand.
pwb
Wednesday, 09 August 2006 08:53:12 (GMT Daylight Time, UTC+01:00)
I worked in a company where developers are responsible for their products. We did everything from talking to customers to testing to helping them with implentation... And after having completed some projects, there's only one solution to actually keep codinc: leave the company. After maybe ten medium sized projects there's so much maintainance work that you can't do anything else than answer the phone every few minutes. You just can't get into work anymore, which is very frustrating.
Wednesday, 09 August 2006 09:53:16 (GMT Daylight Time, UTC+01:00)
I second your thoughts Dare. What would be the point of creating branches in software development if it was to attribute them all to one type of software engineer.

IMO a more sensible thing to do would be to improve (and sometimes simply enable) communication between developers and testers. They should work hand in hand but are too often fighting each other. Developers should understand how technical and difficult being a tester is and testers should understand the pressure of being a developer.

- Sylvain
Thursday, 10 August 2006 18:43:34 (GMT Daylight Time, UTC+01:00)
"attribute them all to one type of software engineer"

That's not what's happening. It's that the *group* that makes the software is responsible for running it. That group can include whatever types of people the group leader wishes.
pwb
Monday, 21 August 2006 23:34:35 (GMT Daylight Time, UTC+01:00)
I don't think that the TDD anecdote is particularly relevant - TDD is a development approach, not a test approach, and anybody who thinks that TDD means you don't need a test team is misguided.

Or, to put it another way, TDD tests are not the kinds of tests that QA people write. Though it *is* possible to have an org where developers do write both TDD tests and "QA" tests, and most companies don't have real test departments the way MS does...

On the overall point, I'm a big advocate of making developers feel the pain of their choices. I'm not sure getting rid of operations is the right choice, but neither is insulating them from the pain that they're causing others, pain that can often be reduced considerably with a small amount of effort.

My experience is that insulation usually causes more problems than it solves. Sure, the developers get more done, but what they create is usually much less useful overall.

Comments are closed.