Sunday, 15 January 2006 - Dare Obasanjo's weblog

January 15, 2006

@ 09:37 PM

Jubilee Thoughts: What's Next for RSS Bandit?

It's been about three years since I first started on RSS Bandit and it doesn't seem like I've run out of steam yet. Every release the application seems to become more popular and last month we finally broke 100,000 downloads in a single month. The time has come for me to start thinking about what I'd like to see in the next family of releases and elicit feedback from our users. The next release is codenamed Jubilee.

Below are a list of feature areas where I'd like to see us work on over the next few months

Extensibility Framework to Enable Richer Plugins: We currently use the IBlogExtension plugin mechanism which allows one to add new context menu items when right-clicking on an item in the list view. I've used this to implement features such as the "Email This" and "Post to del.icio.us" which ship with the default install. Torsten implemented "Send to OneNote" using this mechanism as well.

The next step is to enable richer plugins so people can add their own menu items, toolbar buttons as well as processing steps for received feed items. Torsten used a prototype of this functionality to add Ad blocking features to RSS Bandit. I'd like to add weblog posting functionality using such a plugin model instead of making it a core part of the application since many of our users may just want a reader and not a weblog editor as well.
Comment Watching: For many blogs such as Slashdot and Mini-Microsoft, the comments are often more interesting than the actual blog post. In the next version we'd like to make it easier to not only be updated when new blog posts appear in a blog you are interested in but also when new comments show up in a post you are interested in as well.
Provide better support for podcasts and other rich media in feeds: With the growing popularity of podcasts, we plan to make it easier for users to discover and download rich media from their feeds. This doesn't just mean supporting downloading media files in the background but also supporting better ways of displaying rich media in our default views. Examples of what we have in mind can be taken from the post Why should text have all the fun? in the Google Reader blog. We should have richer experiences for photo feeds, audio feeds and video feeds.
Thumbs Up & Thumbs Down for Filtering and Suggesting New Feeds: big problem with using a news aggregator is that it eventually leads to information overload. One tends to subscribe to feeds which produce lots of content of which only a subset is of interest to the user. At the other extreme, users often find it difficult to find new content that matches their interests. Both of these problems can be solved by providing a mechanism which allows the user to rate feeds or entries that are of interest to the user. A thumbs up or thumbs down rating similar to what systems such as TiVo use today. This system can be used to highlight items of interest from subscribed feeds to the user or suggest new feeds using a suggestion service such as AmphetaRate.
Applying search filters to the list view: In certain cases a user may want to perform the equivalent of a search on the items currently being displayed in the list view without resorting to an explicit search. An example is showing all the unread items in the list view. RSS Bandit should provide a way to apply filters to the items currently being displayed in the list view either by providing certain predefined filters or providing the option to apply search folder queries as filters.

These are just some of the ideas I've had. There are also the dozens of feature requests we've received from our users over the past couple of months which we'll use as fodder for ideas for the Jubilee release.

Categories: RSS Bandit

January 15, 2006

@ 08:08 PM

Comments [3]

The Microsoft Way

Dave Winer made the following insightful observation in a recent blog post

Jeremy Zawodny, who works at Yahoo, says that Google is Yahoo 2.0. Very clever, and there's a lot of truth to it, but watch out, that's not a very good place to be. That's how Microsoft came to dominate the PC software industry. By shipping (following the analogy) WordPerfect 2.0 (and WordStar, MacWrite and Multimate) and dBASE 2.0 (by acquiring FoxBase) and Lotus 2.0 (also known as Excel). It's better to produce your own 2.0s, as Microsoft's vanquished competitors would likely tell you.

Microsoft's corporate culture is very much about looking at an established market leader then building a competing product which is (i) integrated with a family of Microsoft products and (ii) fixes some of the weakneses in the competitors offerings. The company even came up with the buzzword Integrated Innovation to describe some of these aspects of its corporate strategy.

Going further, one could argue that when Microsoft does try to push disruptive new ideas the lack of a competitor to focus on leads to floundering by the product teams involved. Projects such as WinFS, Netdocs and even Hailstorm can be cited as examples of projects that floundered due to the lack of a competitive focus.

New employees to Microsoft are sometimes frustrated by this aspect of Microsoft's culture. For some it's hard to acknowledge that working at Microsoft isn't about building cool, new stuff but about building cooler versions of products offered by our competitors which integrate well with other Microsoft products. This ethos not only brought us Microsoft Office which Dave mentions in his post but also newer examples including XBox (a better Playstation), C# (a better Java) and MSN Spaces (a better TypePad/Blogger/LiveJournal).

The main reason I'm writing this is so I don't have to keep explaining it to people, I can just give them a link to this blog post next time it comes up.

Categories: Life in the B0rg Cube

January 13, 2006

@ 06:21 PM

Comments [0]

The Hotmail Team on Building Scalable Web Applications

A few members of the ~~Hotmail~~ Windows Live Mail team have been doing some writing about scalability recently

From the ACM Queue article A Conversation with Phil Smoot

BF Can you give us some sense of just how big Hotmail is and what the challenges of dealing with something that size are?

PS Hotmail is a service consisting of thousands of machines and multiple petabytes of data. It executes billions of transactions over hundreds of applications agglomerated over nine years—services that are built on services that are built on services. Some of the challenges are keeping the site running: namely dealing with abuse and spam; keeping an aggressive, Internet-style pace of shipping features and functionality every three and six months; and planning how to release complex changes over a set of multiple releases.

QA is a challenge in the sense that mimicking Internet loads on our QA lab machines is a hard engineering problem. The production site consists of hundreds of services deployed over multiple years, and the QA lab is relatively small, so re-creating a part of the environment or a particular issue in the QA lab in a timely fashion is a hard problem. Manageability is a challenge in that you want to keep your administrative headcount flat as you scale out the number of machines.

BF I have this sense that the challenges don’t scale uniformly. In other words, are there certain scaling points where the problem just looks completely different from how it looked before? Are there things that are just fundamentally different about managing tens of thousands of systems compared with managing thousands or hundreds?

PS Sort of, but we tend to think that if you can manage five servers you should be able to manage tens of thousands of servers and hundreds of thousands of servers just by having everything fully automated—and that all the automation hooks need to be built in the service from the get-go. Deployment of bits is an example of code that needs to be automated. You don’t want your administrators touching individual boxes making manual changes. But on the other side, we have roll-out plans for deployment that smaller services probably would not have to consider. For example, when we roll out a new version of a service to the site, we don’t flip the whole site at once.

We do some staging, where we’ll validate the new version on a server and then roll it out to 10 servers and then to 100 servers and then to 1,000 servers—until we get it across the site. This leads to another interesting problem, which is versioning: the notion that you have to have multiple versions of software running across the sites at the same time. That is, version N and N+1 clients need to be able to talk to version N and N+1 servers and N and N+1 data formats. That problem arises as you roll out new versions or as you try different configurations or tunings across the site.

Another hard problem is load balancing across the site. That is, ensuring that user transactions and storage capacity are equally distributed over all the nodes in the system without any particular set of nodes getting too hot.

From the the blog post entitled Issues with .NET Frameworks 2.0by Walter Hsueh

Our team is tackling the scale issues, delving deep into the CLR and understanding its behavior. We've identified at least two issues in .NET Frameworks 2.0 that are "low-hanging fruit", and are hunting for more.

1a) Regular Expressions can be very expensive. Certain (unintended and intended) strings may cause RegExes to exhibit exponential behavior. We've taken several hotfixes for this. RegExes are so handy, but devs really need to understand how they work; we've gotten bitten by them.

1b) Designing an AJAX-style browser application (like most engineering problems) involves trading one problem for another. We can choose to shift the application burden from the client onto the server. In the case of RegExes, it might make sense to move them to the client (where CPU can be freely used) instead of having them run on the server (where you have to share). WindowsLive Mail made this tradeoff in one case.

2) Managed Thread Local Storage (TLS) is expensive. There is a global lock in the Whidbey RTM implementation of Thread.GetData/Thread.SetData which causes scalability issues. Recommendation is to use the [ThreadStatic] attribute on static class variables. Our RPS went up, our CPU % went down, context switches dropped by 50%, and lock contentions dropped by over 80%. Good stuff.

Our devs have also started migrating some of our services to Whidbey and they've also found some interesting issues with regards to performance. It'd probably would be a good idea to get together some sort of lessons learned while building mega-scale services on the .NET Framework article together.

Categories: Windows Live

January 12, 2006

@ 11:46 PM

Comments [1]

On Recent Availability Issues with MSN Spaces

Recently we had some availability issues with MSN Spaces which have caused some complaints from some of our loyal customers. Mike Torres addresses these issues in his post Performance & uptime which states

One of the hardest parts about running a worldwide service with tens of millions of users is maintaining service performance and overall uptime. As a matter of fact, a member of our team (Dare) had some thoughts about this not too long ago. While we're constantly working towards 100% availability and providing the world's fastest service, sometimes we run into snags along the way that impact your experience with MSN Spaces.

That seems to have happened yesterday. For the networking people out there, it turned out to be a problem with a load balancing device resulting in packet loss (hence the overall slowness of the site). After some investigation, the team was able to determine the cause and restore the site back to normal.

Rest assured that as soon as the service slows down even a little bit, or it becomes more difficult to reach individual spaces, we're immediately aware of it here within our service operations center. Within minutes we have people working hard to restore things to their normal speedy and reliable state. Of course, sometimes it takes a little while to get things back to normal - but don't believe for a second that we aren't aware or concerned about the problem. As a matter of fact, almost everyone on our team uses Spaces daily (surprise!) so we are just as frustrated as you are when things slow down. So I'm personally sorry if you were frustrated yesterday - I know I was! We are going to continue to do everything we can to minimize any impact on your experience... most of the time we'll be successful and every once in a while we won't. But it's our highest priority and you have a firm commitment from us to do so.

I'm glad to seeing us be more transparent about what's going on with our services. This is a good step.

Categories: MSN

January 12, 2006

@ 09:58 PM

Comments [9]

Microformats vs. XML: Was the XML Vision Wrong?

Over a year ago, I wrote a blog post entitled SGML on the Web: A Failed Dream? where I asked whether the original vision of XML had failed. Below are excerpts from that post

The people who got together to produce the XML 1.0 recommendation where motivated to do this because they saw a need for SGML on the Web. Specifically
their discussions focused on two general areas:
Classes of software applications for which HTML was an inadequate information format
Aspects of the SGML standard itself that impeded SGML's acceptance as a widespread information technology

The first discussion established the need for SGML on the web. By articulating worthwhile, even mission-critical work that could be done on the web if there were a suitable information format, the SGML experts hoped to justify SGML on the web with some compelling business cases.

The second discussion raised the thornier issue of how to "fix" SGML so that it was suitable for the web.

And thus XML was born.
...
The W3C's attempts to get people to author XML directly on the Web have mostly failed as can be seen by the dismal adoption rate of XHTML and in fact many [including myself] have come to the conclusion that the costs of adopting XHTML compared to the benefits are too low if not non-existent. There was once an expectation that content producers would be able to place documents conformant to their own XML vocabularies on the Web and then display would entirely be handled by stylesheets but this is yet to become widespread. In fact, at least one member of a W3C working group has called this a bad practice since it means that User Agents that aren't sophisticated enough to understand style sheets are left out in the cold.

Interestingly enough although XML has not been as successfully as its originators initially expected as a markup language for authoring documents on the Web it has found significant success as the successor to the Comma Separated Value (CSV) File Format. XML's primary usage on the Web and even within internal networks is for exchanging machine generated, structured data between applications. Speculatively, the largest usage of XML on the Web today is RSS and it conforms to this pattern.

These thoughts were recently rekindled when reading Tim Bray's recent post Don’t Invent XML Languages where Tim Bray argues that people should stop designing new XML formats. For designing new data formats for the Web, Tim Bray advocates the use of Microformats instead of XML.

The vision behind microformats is completely different from the XML vision. The original XML inventers started with the premise that HTML is not expressive enough to describe every possible document type that would be exchanged on the Web. Proponents of microformats argue that one can embed additional semantics over HTML and thus HTML is expressive enough to represent every possible document type that could be exchanged on the Web. I've always considered it a gross hack to think that instead of having an HTML web page for my blog and an Atom/RSS feed, instead I should have a single HTML page with <div class="rss:item"> or <h3 class="atom:title"> embedded in it instead. However given that one of the inventors of XML (Tim Bray) is now advocating this approach, I wonder if I'm simply clinging to old ways and have become the kind of intellectual dinosaur I bemoan.

Categories: XML

January 11, 2006

@ 04:09 AM

Comments [0]

MetaWeblog API for MSN Spaces Documentation Available

The documentation for the implementation of the MetaWeblog API for MSN Spaces is now available on MSDN.

Developers interested in building applications that can be used to create, edit or delete blog posts on a space should read the documentation about the MetaWeblogAPI and MSN Spaces. Questions about the API should be directed to the MSN Spaces Development forum.

PS: I forgot to blog about this over the holidays but astute folks who've been watching http://msdn.microsoft.com/msn already found this out.

Categories: Windows Live

January 11, 2006

@ 03:54 AM

Comments [2]

Posting to MSN Spaces from Flickr

The following is a tutorial on posting to your blog on MSN Spaces using the Flickr.

Create a Space on http://spaces.msn.com if you don't have one
Go to 'Edit Your Space->Settings->Email Publishing'
Turn on Email Publishing (screenshot below)
Choose a secret word (screenshot below)
Create an account on Yahoo or Flickr if you don't have one
Go to the "Add a Weblog" page at http://flickr.com/blogs_add_metaweblogapi.gne
Specify the API end point as https://storage.msn.com/StorageService/MetaWeblog.rpc. The user name is the name of your space (e.g. I use 'carnage4life' because the URL of my space is http://spaces.msn.com/members/carnage4life). The password is the secret word you selected when you turned on Email Publishing on your space.
Click "Next", then Click "All Done"
Go ahead and create blog posts on your space directly from Flickr. You can see the test post I made to my space from Flickr here.

PS: Thanks to the Yahoo! folks on the Flickr team who helped debug the issues that prevented this from working when we first shipped our MetaWeblog API support.

Categories: Windows Live

January 10, 2006

@ 04:27 PM

Comments [11]

Does Microsoft Use Managed Code?

In his blog post Windows DVD Maker Not Managed Code Charles Cook writes

Last week Eric Gunnerson mentioned that he has been working on an application for Vista: Windows DVD Maker. Yesterday he posted a FAQ for the application. The answer to question 4 was disappointing:

4: Is DVD Maker written in managed code?

A: No. Yes, it is ironic that I spent so much time on C# and then spent a ton of time writing something in C++ code. Everybody on the team is a believer in managed code, and we hope we'll be able to use it for future projects.
Given that there is a whole new set of APIs in Vista for writing managed applications - Avalon, WinFX, etc - why has a new self-contained app like this been written in unmanaged C++? Actually writing real applications, instead of just samples, with the new managed APIs would be far more convincing than any amount of hype from Robert Scoble.

I agree with Charles. If Microsoft believed in managed code, we would build applications using the .NET Framework. We do.

In his post Cha-Cha-Changes Dan Fernandez wrote

The Microsoft's not using Managed Code Myth
One of the biggest challenges in my old job was that customers didn't think Microsoft was using managed code. Well, the truth is that we have a good amount of managed code in the three years that the .NET Framework has been released including operating systems, client tools, Web properties, and Intranet applications. For those of you that refuse to believe, here's an estimate of the lines of managed code in Microsoft applications that I got permission to blog about:

Visual Studio 2005: 7.5 million lines
SQL Server 2005: 3 million lines
BizTalk Server: 2 million lines
Visual Studio Team System: 1.7 million lines
Windows Presentation Foundation: 900K lines
Windows Sharepoint Services: 750K lines
Expression Interactive Designer: 250K lines
Sharepoint Portal Server: 200K lines
Content Management Server: 100K lines

We also use managed code for the online services that power various ~~MSN~~ Windows Live properties from Windows Live Messenger and Windows Live Mail to Live.com and Windows Live Expo. I find it surprising that people continue to think that we don't use managed code at Microsoft.

Categories: Life in the B0rg Cube

January 10, 2006

@ 04:05 PM

Comments [6]

Shelley Powers on DRM and the Cluetrain Clique

There are a couple of contentious topics I tend not to bother debating online because people on both sides of the argument tend to have entrenched positions. The debate on abortion in the U.S. is an example of such a topic. Another one for me is DRM and it's sister topics ~~piracy~~ copyright infringement and file sharing networks.

Shelley Powers doesn't seem to have my aversion for these topics and has written an insightful post entitled Debate on DRM which contains the following excerpt

Doc Searls points to a weblog post by the Guardian Unlimited’s Lloyd Shepherd on DRM and says it’s one of the most depressing things he’s read. Shepherd wrote:

I’m not going to pick a fight with the Cory Doctorows of the world because they’re far more informed and cleverer than me, but let’s face it: we’re going to have to have some DRM. At some level, there has to be an appropriate level of control over content to make it economically feasible for people to produce it at anything like an industrial level. And on the other side of things, it’s clear that the people who make the consumer technology that ordinary people actually use - the Microsofts and Apples of the world - have already accepted and embraced this. The argument has already moved on.

Doc points to others making arguments in refutation of Shepherd’s thesis (Tom Coates and Julian Bond), and ends his post with:

We need to do with video what we’ve started doing with music: building a new and independent industry...

I don’t see how DRM necessarily disables independents from continuing their efforts. Apple has invested in iTunes and iPods, but one can still listen to other formats and subscribe to other services from a Mac. In fact, what Shepard is proposing is that we accept the fact that companies like Apple and Google and Microsoft and Yahoo are going to have these mechanisms in place, and what can we do to ensure we continue to have options on our desktops?

There’s another issue though that’s of importance to me in that the concept of debate being debated (how’s this for a circular discussion). The Cluetrain debate method consists of throwing pithy phrases at each other over (pick one): spicey noodles in Silicon Valley; a glass of ale in London; something with bread in Paris; a Boston conference; donuts in New York. He or she who ends up with the most attention (however attention is measured) wins.

In Doc’s weblog comments, I wrote:

What debate, though? Those of us who have pointed out serious concerns with Creative Commons (even demonstrating problems) are ignored by the creative commons people. Doc, you don’t debate. You repeat the same mantra over and over again: DRM is bad, openness is good. Long live the open internet (all the while you cover your ears with your hands and hum “We are the Champions” by Queen under your breath).

Seems to me that Lloyd Shepherd is having the debate you want. He’s saying, DRM is here, it’s real, so now how are we going to come up with something that benefits all of us?

Turning around going, “Bad DRM! Bad!” followed by pointing to other people going “Bad DRM! Bad!” is not an effective response. Neither is saying how unprofitable it is, when we only have to turn our little eyeballs over to iTunes to generate an “Oh, yeah?”

Look at the arguments in the comments to Shepherd’s post. He is saying that as a business model, we’re seeing DRM work. The argument back is that the technology fails. He’s talking ‘business’ and the response is ‘technology’. And when he tries to return to business, the people keep going back to technology (with cries of ‘…doomed to failure! Darknet!’).

The CES you went to showed that DRM is happening. So now, what can we do to have input into this to ensure that we’re not left with orphaned content if a particular DRM goes belly up? That we have fair use of the material? If it is going to exist, what can we do to ensure we’re not all stuck with betamax when the world goes VHS?

Rumbles of ‘darknet’, pointers to music stores that feature few popular artists, and clumsy geeky software as well as loud hyperbole from what is a small majority does not make a ‘debate’. Debate is acknowledging what the other ’side’ is saying, and responding accordingly. Debate requires some openness.

There is reason to be concerned about DRM (Digital Rights Management–using technology to restrict access to specific types of media). If operating systems begin to limit what we can and cannot use to view or create certain types of media; if search engine companies restrict access to specific types of files; if commercial competition means that me having an iPod, as compared to some other device, limits the music or services at other companies I have access to, we are at risk in seeing certain components of the internet torn into pieces and portioned off to the highest bidders.

But by saying that all DRM is evil and that only recourse we have is to keep the Internet completely free, and only with independents will we win and we will win, oh yes we will–this not only disregards the actuality of what’s happening now, it also disregards that at times, DRM can be helpful for those not as well versed in internet technologies.

I tend to agree with Shelley 100% [as usual]. As much as the geeks hate to admit it, DRM is here to stay. The iTunes/iPod combination has shown that consumers will accept DRM in situations where they are provided value and that the business model is profitable. Secondly, as Lloyd Shepherd points out, the major technology companies from Microsoft and Intel to Apple and Google are all building support for DRM in their products for purchasing and/or consuming digital media.

Absolutists who argue that DRM is evil and should be shunned are ignoring reality. I especially despise arguments that are little more than throwing around dogmatic, pithy phrases such as "information wants to be free" and other such mindless drivel. If you really think DRM is the wrong direction, then create the right direction by proposing or building a workable alternative that allows content creators to get paid without losing their rights. I'd like to see more discussions in the blogosphere like Tim Bray's On Selling Art instead of the kind of crud perpetuated by people like Cory Doctorow which made me stop reading Boing Boing.

PS: There's also a good discussion going on in the comments to Shelley's blog post. Check it out.

Categories: Technology

January 10, 2006

@ 03:23 PM

Comments [2]

http://www.google.com/ig/dell

I found out about http://www.google.com/ig/dell via John Batelle's blog last night. It looks like Google now has a personalized home page for users of Dell computers.

During the Web 2.0 conference, Sergey Brin commented that "Google innovates with technology not with business". I don't know about that. The AdSense/AdWords market is business genius and the fact that they snagged the AOL deal from more experienced companies like Microsoft shows that behind the mask of technical naivette is a company with strong business sense.

If I was competing with a company that produced the dominant operating system and Web browser used to access my service, I'd figure ways to disintermediate them. Perhaps by making deals with OEMs so that all the defaults for online services such as search which ships on PCs point to my services. Maybe I could incentivize them to do this if there is the promise of recurring revenue by giving them a cut of ad revenue from searches performed on said portal pages.

Of course, this may not be what http://www.google.com/ig/dell is for, but if it isn't I wouldn't be surprised if that doesn't eventually become the case.

Categories: Current Affairs

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Sunday, 15 January 2006 - Dare Obasanjo's weblog