A few members of the Hotmail Windows Live Mail team have been doing some writing about scalability recently

From the ACM Queue article A Conversation with Phil Smoot

BF Can you give us some sense of just how big Hotmail is and what the challenges of dealing with something that size are?

PS Hotmail is a service consisting of thousands of machines and multiple petabytes of data. It executes billions of transactions over hundreds of applications agglomerated over nine years—services that are built on services that are built on services. Some of the challenges are keeping the site running: namely dealing with abuse and spam; keeping an aggressive, Internet-style pace of shipping features and functionality every three and six months; and planning how to release complex changes over a set of multiple releases.

QA is a challenge in the sense that mimicking Internet loads on our QA lab machines is a hard engineering problem. The production site consists of hundreds of services deployed over multiple years, and the QA lab is relatively small, so re-creating a part of the environment or a particular issue in the QA lab in a timely fashion is a hard problem. Manageability is a challenge in that you want to keep your administrative headcount flat as you scale out the number of machines.

BF I have this sense that the challenges don’t scale uniformly. In other words, are there certain scaling points where the problem just looks completely different from how it looked before? Are there things that are just fundamentally different about managing tens of thousands of systems compared with managing thousands or hundreds?

PS Sort of, but we tend to think that if you can manage five servers you should be able to manage tens of thousands of servers and hundreds of thousands of servers just by having everything fully automated—and that all the automation hooks need to be built in the service from the get-go. Deployment of bits is an example of code that needs to be automated. You don’t want your administrators touching individual boxes making manual changes. But on the other side, we have roll-out plans for deployment that smaller services probably would not have to consider. For example, when we roll out a new version of a service to the site, we don’t flip the whole site at once.

We do some staging, where we’ll validate the new version on a server and then roll it out to 10 servers and then to 100 servers and then to 1,000 servers—until we get it across the site. This leads to another interesting problem, which is versioning: the notion that you have to have multiple versions of software running across the sites at the same time. That is, version N and N+1 clients need to be able to talk to version N and N+1 servers and N and N+1 data formats. That problem arises as you roll out new versions or as you try different configurations or tunings across the site.

Another hard problem is load balancing across the site. That is, ensuring that user transactions and storage capacity are equally distributed over all the nodes in the system without any particular set of nodes getting too hot.

From the the blog post entitled Issues with .NET Frameworks 2.0by Walter Hsueh

Our team is tackling the scale issues, delving deep into the CLR and understanding its behavior.  We've identified at least two issues in .NET Frameworks 2.0 that are "low-hanging fruit", and are hunting for more.

1a)  Regular Expressions can be very expensive.  Certain (unintended and intended) strings may cause RegExes to exhibit exponential behavior.  We've taken several hotfixes for this.  RegExes are so handy, but devs really need to understand how they work; we've gotten bitten by them.

1b)  Designing an AJAX-style browser application (like most engineering problems) involves trading one problem for another.  We can choose to shift the application burden from the client onto the server.  In the case of RegExes, it might make sense to move them to the client (where CPU can be freely used) instead of having them run on the server (where you have to share).  WindowsLive Mail made this tradeoff in one case.

2)  Managed Thread Local Storage (TLS) is expensive.  There is a global lock in the Whidbey RTM implementation of Thread.GetData/Thread.SetData which causes scalability issues.  Recommendation is to use the [ThreadStatic] attribute on static class variables.  Our RPS went up, our CPU % went down, context switches dropped by 50%, and lock contentions dropped by over 80%.  Good stuff.

Our devs have also started migrating some of our services to Whidbey and they've also found some interesting issues with regards to performance. It'd probably would be a good idea to get together some sort of lessons learned while building mega-scale services on the .NET Framework article together.


 

Categories: Windows Live

Recently we had some availability issues with MSN Spaces which have caused some  complaints from some of our loyal customers. Mike Torres addresses these issues in his post Performance & uptime which states

One of the hardest parts about running a worldwide service with tens of millions of users is maintaining service performance and overall uptime.  As a matter of fact, a member of our team (Dare) had some thoughts about this not too long ago.  While we're constantly working towards 100% availability and providing the world's fastest service, sometimes we run into snags along the way that impact your experience with MSN Spaces.
 
That seems to have happened yesterday.  For the networking people out there, it turned out to be a problem with a load balancing device resulting in packet loss (hence the overall slowness of the site).  After some investigation, the team was able to determine the cause and restore the site back to normal.
 
Rest assured that as soon as the service slows down even a little bit, or it becomes more difficult to reach individual spaces, we're immediately aware of it here within our service operations center.  Within minutes we have people working hard to restore things to their normal speedy and reliable state.  Of course, sometimes it takes a little while to get things back to normal - but don't believe for a second that we aren't aware or concerned about the problem.  As a matter of fact, almost everyone on our team uses Spaces daily (surprise!) so we are just as frustrated as you are when things slow down.  So I'm personally sorry if you were frustrated yesterday - I know I was!  We are going to continue to do everything we can to minimize any impact on your experience...  most of the time we'll be successful and every once in a while we won't.  But it's our highest priority and you have a firm commitment from us to do so.

I'm glad to seeing us be more transparent about what's going on with our services. This is a good step.


 

Categories: MSN

Over a year ago, I wrote a blog post entitled SGML on the Web: A Failed Dream? where I asked whether the original vision of XML had failed. Below are excerpts from that post

The people who got together to produce the XML 1.0 recommendation where motivated to do this because they saw a need for SGML on the Web. Specifically  
their discussions focused on two general areas:
  • Classes of software applications for which HTML was an inadequate information format
  • Aspects of the SGML standard itself that impeded SGML's acceptance as a widespread information technology

The first discussion established the need for SGML on the web. By articulating worthwhile, even mission-critical work that could be done on the web if there were a suitable information format, the SGML experts hoped to justify SGML on the web with some compelling business cases.

The second discussion raised the thornier issue of how to "fix" SGML so that it was suitable for the web.

And thus XML was born.
...
The W3C's attempts to get people to author XML directly on the Web have mostly failed as can be seen by the dismal adoption rate of XHTML and in fact many [including myself] have come to the conclusion that the costs of adopting XHTML compared to the benefits are too low if not non-existent. There was once an expectation that content producers would be able to place documents conformant to their own XML vocabularies on the Web and then display would entirely be handled by stylesheets but this is yet to become widespread. In fact, at least one member of a W3C working group has called this a bad practice since it means that User Agents that aren't sophisticated enough to understand style sheets are left out in the cold.

Interestingly enough although XML has not been as successfully as its originators initially expected as a markup language for authoring documents on the Web it has found significant success as the successor to the Comma Separated Value (CSV) File Format. XML's primary usage on the Web and even within internal networks is for exchanging machine generated, structured data between applications. Speculatively, the largest usage of XML on the Web today is RSS and it conforms to this pattern.

These thoughts were recently rekindled when reading Tim Bray's recent post Don’t Invent XML Languages where Tim Bray argues that people should stop designing new XML formats. For designing new data formats for the Web, Tim Bray advocates the use of Microformats instead of XML.

The vision behind microformats is completely different from the XML vision. The original XML inventers started with the premise that HTML is not expressive enough to describe every possible document type that would be exchanged on the Web. Proponents of microformats argue that one can embed additional semantics over HTML and thus HTML is expressive enough to represent every possible document type that could be exchanged on the Web. I've always considered it a gross hack to think that instead of having an HTML web page for my blog and an Atom/RSS feed, instead I should have a single HTML  page with <div class="rss:item"> or <h3 class="atom:title"> embedded in it instead. However given that one of the inventors of XML (Tim Bray) is now advocating this approach, I wonder if I'm simply clinging to old ways and have become the kind of intellectual dinosaur I bemoan. 


 

Categories: XML

The documentation for the implementation of the MetaWeblog API for MSN Spaces is now available on MSDN.

Developers interested in building applications that can be used to create, edit or delete blog posts on a space should read the documentation about the MetaWeblogAPI and MSN Spaces. Questions about the API should be directed to the MSN Spaces Development forum.

PS: I forgot to blog about this over the holidays but astute folks who've been watching http://msdn.microsoft.com/msn already found this out.


 

Categories: Windows Live

January 11, 2006
@ 03:54 AM
 The following is a tutorial on posting to your blog on MSN Spaces using the Flickr.
  1. Create a Space on http://spaces.msn.com if you don't have one

  2. Go to 'Edit Your Space->Settings->Email Publishing'

  3. Turn on Email Publishing (screenshot below)

  4. Choose a secret word (screenshot below)

  5. Create an account on Yahoo or Flickr if you don't have one

  6. Go to the "Add a Weblog" page at http://flickr.com/blogs_add_metaweblogapi.gne

  7. Specify the API end point as https://storage.msn.com/StorageService/MetaWeblog.rpc. The user name is the name of your space (e.g. I use 'carnage4life' because the URL of my space is http://spaces.msn.com/members/carnage4life). The password is the secret word you selected when you turned on Email Publishing on your space.

  8. Click "Next", then Click "All Done"

  9. Go ahead and create blog posts on your space directly from Flickr. You can see the test post I made to my space from Flickr here.

PS: Thanks to the Yahoo! folks on the Flickr team who helped debug the issues that prevented this from working when we first shipped our MetaWeblog API support.


 

Categories: Windows Live

January 10, 2006
@ 04:27 PM

In his blog post Windows DVD Maker Not Managed Code Charles Cook writes

Last week Eric Gunnerson mentioned that he has been working on an application for Vista: Windows DVD Maker. Yesterday he posted a FAQ for the application. The answer to question 4 was disappointing:

4: Is DVD Maker written in managed code?

A: No. Yes, it is ironic that I spent so much time on C# and then spent a ton of time writing something in C++ code. Everybody on the team is a believer in managed code, and we hope we'll be able to use it for future projects.

Given that there is a whole new set of APIs in Vista for writing managed applications - Avalon, WinFX, etc - why has a new self-contained app like this been written in unmanaged C++? Actually writing real applications, instead of just samples, with the new managed APIs would be far more convincing than any amount of hype from Robert Scoble.

I agree with Charles. If Microsoft believed in managed code, we would build applications using the .NET Framework. We do.

In his post Cha-Cha-Changes Dan Fernandez wrote

The Microsoft's not using Managed Code Myth
One of the biggest challenges in my old job was that customers didn't think Microsoft was using managed code. Well, the truth is that we have a good amount of managed code in the three years that the .NET Framework has been released including operating systems, client tools, Web properties, and Intranet applications. For those of you that refuse to believe, here's an estimate of the lines of managed code in Microsoft applications that I got permission to blog about:

  • Visual Studio 2005: 7.5 million lines
  • SQL Server 2005: 3 million lines
  • BizTalk Server: 2 million lines
  • Visual Studio Team System: 1.7 million lines
  • Windows Presentation Foundation: 900K lines
  • Windows Sharepoint Services: 750K lines
  • Expression Interactive Designer: 250K lines  
  • Sharepoint Portal Server: 200K lines
  • Content Management Server: 100K lines

We also use managed code for the online services that power various MSN Windows Live properties from Windows Live Messenger and Windows Live Mail to Live.com and Windows Live Expo. I find it surprising that people continue to think that we don't use managed code at Microsoft.


 

Categories: Life in the B0rg Cube

There are a couple of contentious topics I tend not to bother debating online because people on both sides of the argument tend to have entrenched positions. The debate on abortion in the U.S. is an example of such a topic. Another one for me is DRM and it's sister topics piracy copyright infringement and file sharing networks.

Shelley Powers doesn't seem to have my aversion for these topics and has written an insightful post entitled Debate on DRM which contains the following excerpt

Doc Searls points to a weblog post by the Guardian Unlimited’s Lloyd Shepherd on DRM and says it’s one of the most depressing things he’s read. Shepherd wrote:

I’m not going to pick a fight with the Cory Doctorows of the world because they’re far more informed and cleverer than me, but let’s face it: we’re going to have to have some DRM. At some level, there has to be an appropriate level of control over content to make it economically feasible for people to produce it at anything like an industrial level. And on the other side of things, it’s clear that the people who make the consumer technology that ordinary people actually use - the Microsofts and Apples of the world - have already accepted and embraced this. The argument has already moved on.

Doc points to others making arguments in refutation of Shepherd’s thesis (Tom Coates and Julian Bond), and ends his post with:

We need to do with video what we’ve started doing with music: building a new and independent industry...


I don’t see how DRM necessarily disables independents from continuing their efforts. Apple has invested in iTunes and iPods, but one can still listen to other formats and subscribe to other services from a Mac. In fact, what Shepard is proposing is that we accept the fact that companies like Apple and Google and Microsoft and Yahoo are going to have these mechanisms in place, and what can we do to ensure we continue to have options on our desktops?

There’s another issue though that’s of importance to me in that the concept of debate being debated (how’s this for a circular discussion). The Cluetrain debate method consists of throwing pithy phrases at each other over (pick one): spicey noodles in Silicon Valley; a glass of ale in London; something with bread in Paris; a Boston conference; donuts in New York. He or she who ends up with the most attention (however attention is measured) wins.

In Doc’s weblog comments, I wrote:

What debate, though? Those of us who have pointed out serious concerns with Creative Commons (even demonstrating problems) are ignored by the creative commons people. Doc, you don’t debate. You repeat the same mantra over and over again: DRM is bad, openness is good. Long live the open internet (all the while you cover your ears with your hands and hum “We are the Champions” by Queen under your breath).

Seems to me that Lloyd Shepherd is having the debate you want. He’s saying, DRM is here, it’s real, so now how are we going to come up with something that benefits all of us?

Turning around going, “Bad DRM! Bad!” followed by pointing to other people going “Bad DRM! Bad!” is not an effective response. Neither is saying how unprofitable it is, when we only have to turn our little eyeballs over to iTunes to generate an “Oh, yeah?”

Look at the arguments in the comments to Shepherd’s post. He is saying that as a business model, we’re seeing DRM work. The argument back is that the technology fails. He’s talking ‘business’ and the response is ‘technology’. And when he tries to return to business, the people keep going back to technology (with cries of ‘…doomed to failure! Darknet!’).

The CES you went to showed that DRM is happening. So now, what can we do to have input into this to ensure that we’re not left with orphaned content if a particular DRM goes belly up? That we have fair use of the material? If it is going to exist, what can we do to ensure we’re not all stuck with betamax when the world goes VHS?

Rumbles of ‘darknet’, pointers to music stores that feature few popular artists, and clumsy geeky software as well as loud hyperbole from what is a small majority does not make a ‘debate’. Debate is acknowledging what the other ’side’ is saying, and responding accordingly. Debate requires some openness.

There is reason to be concerned about DRM (Digital Rights Management–using technology to restrict access to specific types of media). If operating systems begin to limit what we can and cannot use to view or create certain types of media; if search engine companies restrict access to specific types of files; if commercial competition means that me having an iPod, as compared to some other device, limits the music or services at other companies I have access to, we are at risk in seeing certain components of the internet torn into pieces and portioned off to the highest bidders.

But by saying that all DRM is evil and that only recourse we have is to keep the Internet completely free, and only with independents will we win and we will win, oh yes we will–this not only disregards the actuality of what’s happening now, it also disregards that at times, DRM can be helpful for those not as well versed in internet technologies.

I tend to agree with Shelley 100% [as usual]. As much as the geeks hate to admit it, DRM is here to stay. The iTunes/iPod combination has shown that consumers will accept DRM in situations where they are provided value and that the business model is profitable. Secondly,  as Lloyd Shepherd points out,  the major technology companies from Microsoft and Intel to Apple and Google are all building support for DRM in their products for purchasing and/or consuming digital media.

Absolutists who argue that DRM is evil and should be shunned are ignoring reality. I especially despise arguments that are little more than throwing around dogmatic, pithy phrases such as "information wants to be free" and other such mindless drivel. If you really think DRM is the wrong direction, then create the right direction by proposing or building a workable alternative that allows content creators to get paid without losing their rights. I'd like to see more discussions in the blogosphere like Tim Bray's On Selling Art instead of the kind of crud perpetuated by people like Cory Doctorow which made me stop reading Boing Boing.

PS: There's also a good discussion going on in the comments to Shelley's blog post. Check it out.


 

Categories: Technology

January 10, 2006
@ 03:23 PM

I found out about http://www.google.com/ig/dell via John Batelle's blog last night. It looks like Google now has a personalized home page for users of Dell computers.

During the Web 2.0 conference, Sergey Brin commented that "Google innovates with technology not with business". I don't know about that. The AdSense/AdWords market is business genius and the fact that they snagged the AOL deal from more experienced companies like Microsoft shows that behind the mask of technical naivette is a company with strong business sense.

If I was competing with a company that produced the dominant operating system and Web browser used to access my service, I'd figure ways to disintermediate them. Perhaps by making deals with OEMs so that all the defaults for online services such as search which ships on PCs point to my services. Maybe I could incentivize them to do this if there is the promise of recurring revenue by giving them a cut of ad revenue from searches performed on said portal pages.

Of course, this may not be what http://www.google.com/ig/dell is for, but if it isn't I wouldn't be surprised if that doesn't eventually become the case.


 

Categories: Current Affairs

Web usability guru, Jakob Nielsen, has written an article entitled Search Engines as Leeches on the Web which begins

Summary: Search engines extract too much of the Web's value, leaving too little for the websites that actually create the content. Liberation from search dependency is a strategic imperative for both websites and software vendors.

I worry that search engines are sucking out too much of the Web's value, acting as leeches on companies that create the very source materials the search engines index.We've known since AltaVista's launch in 1995 that search is one of the Web’s most important services. Users rely on search to find what they want among the teeming masses of pages. Recently, however, people have begun using search engines as answer engines to directly access what they want -- often without truly engaging with the websites that provide (and pay for) the services..

I've seen some people claim that "Google is Evil" is the new meme among web geeks and this looks like a manifestation of this trend. It looks like the more money Google makes, the more people resent them. Alas, that is the price of success.


 

The Wall Street Journal has an article entitled The Men Who Came To Dinner, and What They Said About Email which contains the following excerpt

"Email is one of the liveliest niches in tech right now. Google, Microsoft and Yahoo all view it as a key to winning new customers and making money off current ones. And so they are innovating with new email programs and services all the time. Since all three companies' email teams are in my neck of the woods, I thought it would be fun to have the heads of each team come over one night for dinner and conversation. The three companies were good sports and agreed, in part because I said I wasn't interested in a shouting match.

As it happened, Google's Paul Buchheit, 29 years old; Kevin Doerr, 39, of Microsoft (no relation to the venture capitalist) and Ethan Diamond, 34, of Yahoo were all on their best behavior. Whatever they may say about their competitors at work, at my table they were gracious and complimentary. Gentle teasing was about as far as they would go.

The evening began with even the Microsoft and Yahoo delegates agreeing that much of the current excitement in the email world can be traced back to last year's debut of Mr. Buchheit's Gmail. The program had a fast user interface with a fresh new look, along with a then-remarkable gigabyte of free storage.

Mr. Buchheit said he started working on Gmail after observing that other email programs were getting worse, not better. Microsoft's Mr. Doerr said that at his company, Gmail was a thunderbolt. "You guys woke us up," he told Mr. Buchheit. Yahoo's Mr. Diamond, then at a startup with its own hot, new email program, said Gmail was the final impetus that Yahoo needed to buy his company.

Mr. Buchheit responded with a victory lap. "We were trying to make the email experience better for our users," he said. "We ended up making it better for yours, too."

The evening wasn't all a Gmail love-in, though. The Microsoft and Yahoo representatives said their many millions of users might not accept some of Gmail's departures from email norms, such as the way the program groups messages into "conversations." The two men also razzed Mr. Buchheit a bit, saying that it had been easy for Google to promise a lot of storage to its users because it carefully controlled how many users Gmail would have by requiring an invitation to get an account."

As someone who has to build services which compete with Google's the last statement in the above excerpt resonates with me. I tend to think that in a number of their products such as GMail, Google Talk and even Google Pack, the folks at Google are practising the lessons learned from articles such as Joel Spolsky's Fire & Motion. In the article Joel Spolsky argues that large companies like Microsoft tend to create technological imperatives that force competitors to respond and keep up thus preventing them from focusing on new features.

Examples of Google practising Fire & Motion are somewhat different from what Joel Spolsky describes in his article but the ideas are similar.  Google tends to create initiatives that are either much more expensive for their competitors than them to provide (e.g. giving users gigabytes of storage space for email but limiting sign ups on the service) or would be detrimental to their market share to compete with (e.g. allowing non-Google clients to access the Google Talk servers). I've had co-workers joke that for every dollar Google spends on some of its efforts, its competitors are forced to spend five to ten dollars. Here is a back of the envelope calculation that illustrates this point.

Email ServiceEstimated Number of UsersInbox SizeTotal Storage provided
GMail 5 million2.5GB12.5 petabytes
Yahoo! Mail219 million
1GB
219 petabytes
HotMail221 million
0.25 GB
55.25 petabytes

Of course, these numbers are off because they are based on estimates. Also I think the Hotmail numbers should be somewhat lower since I haven't confirmed that we've rolled out the 250MB inbox to every market. The point should still be clear though, Google has forced its competitors such as Microsoft and Yahoo! to spend orders of magnitude more money on storage which distracts them from competing with Google in the places where it is strong. More importantly its competitors have to provide from 10 to 20 times the total amount of storage Google is providing just to be competitive. 

This is often the dilemma when competing with Google. On the one hand, you have customers who rightly point out that Google is more generous but on the other the fact is that it costs us a whole lot more to do the things Google does since we have a whole lot more users than they do. The cool things about this is that it forces to be very imaginative about how we are competitive in the market place and challenges are always fun.  


 

Categories: Life in the B0rg Cube