Greg Linden has a blog post entitled Yahoo building a Google FS clone? where he writes

The Hadoop open source project is building a clone of the powerful Google cluster tools Google File System and MapReduce.

I was curious to see how much Yahoo appears to be involved in Hadoop. Doug Cutting, the primary developer of Lucene, Nutch, and Hadoop, is now working for Yahoo but, at the time, that hiring was described as supporting an independent open source project.

Digging further, it seems Yahoo's role is more complicated. Browsing through the Hadoop developers mailing list, I can see that more than a dozen people from Yahoo appear to be involved in Hadoop. In some cases, the involvement is deep. One of the Yahoo developers, Konstantin Shvachko, produced a detailed requirement document for Hadoop. The document appears to lay out what Yahoo needs from Hadoop, including such tidbits as handling 10k+ nodes, 100k simultaneous clients, and 10 petabytes in a cluster.

Also noteworthy is Eric Baldeschwieler, a director of software development at Yahoo, who recently talked about direct support from Yahoo for Hadoop. Eric said, "How we are going to establish a testing / validation regime that will support innovation ... We'll be happy to help staff / fund such a testing policy."

I find this effort by Yahoo! to be rather interesting given that platform pieces like GFS, BigTable, MapReduce and Sawzall give Google quite the edge in building mega-scale services and in Greg Linden's words are 'major force multipliers' that enable them to pump out new online services at a rapid pace. I'd expect Google's competitors to build similar systems and keep them close to their chest not give them away. I suspect that the reason Yahoo! is going this route is that they don't have enough folks to build this in-house and have thus collaborated with Hadoop project to get some help. This could potentially backfire since there is nothing stopping small or large competitors from reusing their efforts especially if it uses a traditional Open Source license.

On a related note, Greg also posted a link to an article by David F. Carr entitled How Google Works which has the following interesting quote

Google has a split personality when it comes to questions about its back-end systems. To the media, its answer is, "Sorry, we don't talk about our infrastructure."

Yet, Google engineers crack the door open wider when addressing computer science audiences, such as rooms full of graduate students whom it is interested in recruiting.

As a result, sources for this story included technical presentations available from the University of Washington Web site, as well as other technical conference presentations, and papers published by Google's research arm, Google Labs.

I do think it is cool that Google developers publish so much about the stuff they are working on. One of the things I miss from being on the XML team at Microsoft is being around people with a culture of publishing research like Erik Meijer and Michael Rys. I even got a research paper on XML query languages published while on the team. I'd definitely would like to publish research quality papers on some of the stuff I'm working on now. I've done MSDN articles and a ThinkWeek paper in the past few years, it's probably about time I start thinking about writing a research paper again. 

PS: If you work on online services and you don't read Greg Linden's blog, you are missing out. Subscribed. 


 

Over the weekend, I had a few hours to spend and finally added comment watching to RSS Bandit. The feature is pretty straightforward, users have the ability to mark an item as 'Watched'. Once in this state, an indication is made when there are new comments for that item. Determining whether there are new comments uses a number of mechanisms including polling the comment feed and checking the values of RSS/Atom extensions such as slash:comments and thr:count. I'm already getting a lot of use out of the feature to passively notify me of new comments to my blog

The only issue now is that there is a disagreement between Torsten and I as to what the menu interaction should be for the feature. I've currently implemented the menu option a submenu where you can select 'Watch Comments->On' or 'Watch Comments->Off' depending on whether comments are currently being watched for that item or not. See the screenshot below.

Torsten would prefer a menu option more like what Outlook Express where the menu option is a checkbox as shown in the OE screenshot below.

If you're an RSS Bandit user can you chime in with your opinion?


 

Categories: RSS Bandit

The Office team continues to impress me how savvy they are about the changing software landscape. In his blog post entitled Open XML Translator project announced (ODF support for Office) Brian Jones writes

Today we are announcing the creation of the Open XML Translator project that will help translate between the Office Open XML formats and the OpenDocument format. We've talked a lot about the value the Open XML formats bring, and one of them of course is the ability to filter it down into other formats. While we still aren't seeing a strong demand for ODF support from our corporate or consumer customers, it's now a bit different with governments. We've had some governments request that we help build solutions so that can use ODF for certain situations, so that's why we are creating the Open XML Translator project. I think it's going to be really beneficial to a number of folks and for a number of reasons.

There has been a push in Microsoft for better interoperability and this is another great step in that direction. We already have the PDF and XPS support for Office 2007 users that unfortunately had to be separated out of the product and instead offered as a free download. There will be a menu item in the Office applications that will point people to the downloads for XPS, PDF, and now ODF. So you'll have the ability to save to and open ODF files directly within Office (just like any other format).

For me, one of the really cool parts of this project is that it will be open source and located up on SourceForge, which means everyone will have the ability to see how to leverage the open architectures of both the Office Open XML formats and ODF. We're developing the tools with the help of Clever Age (based in France) and a few other folks like Aztecsoft (based in India) and Dialogika (based in Germany). There should actually be a prototype of the first translator (for Word 2007) posted up on SourceForge later on today (http://sourceforge.net/projects/odf-converter). It's going to be made available under the BSD license, and anyone can provide feedback, submit bugs, and of course directly contribute to the project. The Word tool should be available by the end of this year, with the Excel and PPT versions following in 2007.

This announcement is cool on so many levels. The coolest being that the projects will not only be Open Source but will be hosted on SourceForge. That is sweet. It is interesting to note that it is government customers and not businesses that are interested in ODF support in Office. I guess that makes sense if you consider which parties have been expressing interest in Open Office.

There already some great analyst responses to this move such as Stephen O'Grady of Redmonk who in his post Microsoft Office to Support ODF: The Q&A has some great insights. My favorite insight is excerpted below

Q: How about Microsoft's competitors?
A: Well, this is a bittersweet moment for them. For those like Corel that have eschewed ODF support, it's a matter of minor importance - at least until Microsoft is able to compete in public sector markets that mandate ODF and they are not.

But for those vendors that have touted ODF support as a diffentiator, this is a good news/bad news deal. The good news is that they can and almost certainly will point to Microsoft's support as validation of further ODF traction and momentum, they will now be competing - at least in theory, remember the limitation - with an Office suite that is frankly the most capable on the market. I've said for years that packages like OpenOffice.org are more than good enough for the majority of users, and that's been validated by our own usage of the product over the past few years; but Microsoft's suite is better than good enough. I'm interested to see if there's any fallout from the UI overhaul, but for now Office remains the undisputed champ of the Office arena. This means that commercial packages like StarOffice and Workplace, not to mention open source projects such as Abiword, KOffice, and OpenOffice.org will have to compete more on features and innovation and less on their support for formats such as ODF or PDF.

It'll be good to see the debate migrate away from support for file formats back to exactly which product's features provides the best value for customers. Everybody wins. Mad props to the Office team for making this decision. Rock on.


 

Categories: XML

Dave Winer has a blog post where he responds to a post entitled SOAP, REST and XML-RPC by Randy Charles Morin. He writes

I wonder if it's be possible for me to disagree with Randy Morin without getting flamed. I never said XML-RPC is better than SOAP or REST, or more perfect or pure, or better documented. I don't care if the others have better websites, or more advocates posting on mail lists. The reason I advise would-be platform developers to support XML-RPC is because at least for some developers (including me) it's so much faster to implement, so we spend less time creating glue and get to building applications sooner. I've learned that the sooner developers get to the fun part, the more likely they are to deploy. And if that's the goal, why not support it? BTW, I never said they shouldn't support SOAP or REST, in fact I often provide multiple interfaces to my would-be platforms, because I've learned that if you want uptake for new ideas, you shouldn't argue over small things like this, you should say yes whenever you can.

I agree 100% with Dave Winer. If you are building a service on the Web, then you shouldn't discriminate against any platform, application or device. This means you can't pick one approach or one technology for building your service because different platforms have different levels of support for various approaches. A developer using Visual Studio will find using SOAP easier then REST or XML-RPC while on the flip side a developer using Python or Perl is likely more at home dealing with XML-RPC than using SOAP. Choosing one technology over the other is choosing to discriminate against one platform or set of developers over the other.

In some cases this is necessary to keep maintenance costs down by supporting a small set of protocols but in general if you are building a service on the Web, you want it to be inclusive not exclusive. Arguments of technological superiority be damned.


 

Categories: XML Web Services

The past few days seem to have been quite interesting in the comments section of the the Mini-Microsoft blog. Ex-Microsoft employee, Robert Scoble jumped into some comment threads where some of his former bosses were being criticized (start here) and it quickly devolved into a flame war. In the aftermath of that flame war, Mini posted an entry entitled Bad Mini, Scoble's Exit, and Truthiness - Links which also led to another series of interesting comments from Robert. The most interesting of which seems not to have been posted but is instead referenced in this excerpted comment by Who da'Punk (aka Mini-Microsoft)

Okay, okay, hold on... things are getting heated again. I've got about six posts in the queue, including Mr. Scoble's "Goodbye I won't ever be commenting here again," comment. So, please hold on to your "Grr, Scoble!" comments because he won't be following up, let alone perhaps reading them. You'd be much better served submitting your comments to his blog or writing your own blog entries and linking appropriately.
...
In the meantime, I'm certainly thinking about Scoble's parting strategic comments:

* The Mini-Microsoft blog's impact has come, been done, and is past.

* The blog serves now to harm Microsoft more than help it.

* The blog is, specifically, being used by the anti-Microsoft crowd and competitors to harm Microsoft.

All good points, and some, worth putting up a pivotal post about.

But not today. Go have fun.

I find it hard to disagree with Robert's above points. The Mini-Microsoft blog has served as a place for Microsoft employees to discuss what riles them about the company in an anonymous setting that is free of recrimination. From my perspective, this has been both good and bad. It has been good to have a forum where people can discuss some aspects of the culture that have been taken for granted but were actually harmful such as The Curve without fear of being attacked for questioning the status quo. Although, it would have been better for this discussion to happen internally there are a number of social and technological reasons why this is difficult.

On the flip side, the Mini-Microsoft blog is a forum where disgruntled employees pour out their bile on the fellow employees and the company as a whole. I've seen character assassination, racism, sexism, fear mongering, unfounded allegations of sexual misconduct, information leaking, and more in the comments section of the Mini-Microsoft blog. However you slice it, it reflects badly on Microsoft that the people posting these comments appear to be Microsoft employees. What is even more interesting is when you consider Robert Scoble's allegation below

Anonymous bloggers are never as credible as ones who stick their names on things.

Why does it bother me? Cause Mini is being used by non-Microsoft employees to hurt Microsoft. I've learned that a lot of the posts here that you're reading aren't done by Microsoft employees.

Yet you are taking it on face value that everyone is being straight up with you here. They are not.

I didn't realize this until after I had left Microsoft (it's funny how people tell you stuff when you aren't a Microsoft employee anymore). I'm not willing to expose my source, though. But I believe him.

That competitors would astroturf the Mini-Microsoft blog or use it as a recruiting tool when competing against Microsoft for a candidate doesn't surprise me. The surprise is that both Mini-Microsoft and Robert Scoble seem to be taken aback by this. I guess I'm more cynical than most.

The bottom line is that I agree with Robert that in its current incarnation Mini-Microsoft does more harm to Microsoft than good. If anything, it does point out the need for a better internal forums for frank and open discussion but I definitely think it's time is past.   


 

Categories: Life in the B0rg Cube

Mike Arrington of TechCrunch fame has a blog post entitled where he lays out the demographics of the various RSS readers used to subscribe to his feed. Below is an excerpt of his post and a partial screenshot of his FeedBurner statistics showing the top fourteen feed readers used to access the TechCrunch feed

Firefox (including Flock) accounts for 20% of feed readers. Bloglines is in second place with 13%, followed by NewsGator at 12%, Rojo at 8%, FeedReader at 7%, and Netvibes at 7%. Other notables include Pageflakes, Pluck and Attensa. If you add NetNewsWire to the core NewsGator stats, NewsGator is actually bigger than bloglines.
...

The feed reader statistics are surprising to me both for the feed readers that show up in the list and for those that don't. For example, I'm surprised to see FeedReader at #5 yet not see FeedDemon in the top 14. Similarly, the popularity of AJAX home pages like Pageflakes and Netvibes over those from the big 3 (Google/Yahoo/Microsoft) is also unexpected. Of course, these statistics might be skewed because TechCrunch is one of the default feeds in Netvibes. A final surprise is that NewsGator Online is almost as popular as Bloglines among readers of TechCrunch.This seems to mean that the latter is finally getting a lot of cred among the early adopter crowd especially since the former has been slow to update in the past year.

For a completely different set of demographics, here are the top 14 feed readers used to access my RSS feed according to FeedBurner.

I wonder what conclusion you draw from how different the distribution of feed readers is in the above screenshots. For example, I think the fact that a bunch of Microsoft employees and developers on Microsoft's platfoms read my blog explains why there are multiple instances of feed readers based on the .NET Framework in the above list. In addition, I suspect this also explains why there is an entry for the Windows RSS platform in the top 10 applications hitting my feed. 

On the flip side, I have no explanation for why it seems that NewsGator Online is half as popular as Bloglines among the readers of my blog.


 

Richard MacManus has a blog post entitled Netscape Community Backlash where he writes

I've been tracking the release of the new Digg-style community news site Netscape.com, because there is a lot of backlash within the Netscape community about it. A story called Netscape's Blunder!!! was number 1 on Netscape.com for a while and the latest post on the homepage is entitled A Request by the Netscape Community to Bring Back Our Netscape.com. There's another Netscape story currently on the homepage called Netscape Reborn: Why? Why? Why?. The backlash has presumably led to this message currently on the right of the homepage, from the Netscape team:

"Attention Netscape users Your Netscape mail hasn't gone anywhere, you can find it right here! Also, My.Netscape and your Stock Quotes are still online as well."

There appears to be a genuine feeling of betrayal by the (very large) set of users who have had Netscape.com as their homepage for some time. Indeed I've been getting comments on my own posts and even emails from Netscape users, upset about the change to the Digg style.

All of this shows how passionate people can get about their Web homepage - and they're just as much a 'community' as the Digg.com users are. It's just that they like the old-school Web homepage, not the new Digg style. Also what this tells me is that while a lot of us geeks and 2.0 types are addicted to our own technology (and our own voices, to be honest), it's pretty darn obvious that A LOT of people want to stick with the status quo.

This is one of those reasons why I believe that Danah Boyd's essays should be required reading for anyone interested in building social software. I disagree with Richard MacManus that the problem is that a lot of people want to stick with the status quo. I agree that it plays a part but the real problem is that AOL made a drastic change to software that was an integral part of their users lives in such a draconian manner.

People grow attached to the software they use and the online community that exists around that software. Heck, I've been using My Yahoo! for the past five or six years and have only partially switched to Live.com even though I made a conscious decision to switch*. I'd personally be pretty irritated if one day Yahoo! radically switched things around in a desperate attempt to jump on the Web 2.0 bandwagon and I'm a tech geek.

AOL should have engaged with their community of users before launching the revamped Digg-like version of Netscape. At the very least, the company should have considered using an alternate URL for the site and not the valuable Netscape.com domain or done some A/B testing to see if users liked the switch over or not. It may be that the people complaining are a vocal minority but something tells me that they aren't given how drastic the change to the site has been. Perhaps making Live.com and MSN.com separate sites wasn't such a bad idea after all. :)

* I use Live.com at work and My Yahoo! at home.


 

Categories: Social Software

July 2, 2006
@ 06:29 AM

Last week I attended the Kenny Chesney concert with my girlfriend and we even took some photos before the concert. A couple of coworkers answered my call for country duds and I got some hats, shirts and a pair of cowboy boots contributed to the cause. I probably should write a review of the concert but its hard for me to judge the musical quality of a concert that had people singing songs like She Thinks My Tractor is Sexy and Save a Horse, Ride a Cowboy. However here are a few observations from the concert
  • There were supposedly over 40,000 tickets sold and it looked like there were tens of thousands of people there. However the crowd wasn't very diverse, it was almost all white guys and white gals. I was the only black person I saw the entire 5.5 hours we were there.

  • Besides Kenny Chesney there was also Gretchen Wilson, Dierks Bentley's, Big & Rich, and a surprise appearance by Uncle Kracker. The crowd seemed to get into all the performances although it was hard for me to since I didn't know most of the songs.

  • I think I saw someone with the worst job in America. One of the concert goers vomited and it seems there were no safety cones available so one of the stadium employees stood over the vomit so that concert goers wouldn't step on it.

  • Unlike hip hop concerts this one started on time. We got there at 5:30PM and we had already missed half of Dierks Bentley's set. Not only did it start early, it ran until 11 PM which means we got our money's worth.

  • This was the largest gathering of people wearing cowboy hats I'd ever seen. This was doubly a surprise given how rarely one encounters cowboy hats in Seattle.


 

Categories: Personal

Cory Doctorow has a blog post up on Boing Boing entitled Mark Pilgrim's list of Ubuntu essentials for ex-Mac users where he writes

Mac guru and software developer Mark Pilgrim recently switched to Ubuntu Linux after becoming fed up with proprietary Mac file-formats and the increasing use of DRM technologies in the MacOS. I've been a Mac user since 1984, and have a Mac tattooed on my right bicep. I've probably personally owned 50 Macs, and I've purchased several hundred while working as an IT manager over the years. I'm about to make the same switch, for much the same reasons.

You could probably write an entire Ph.D dissertation on what would motivate someone to tattoo a corporate logo on their arm. Maybe I should buy a Mac just so I can figure out what all the hype is about.


 

June 30, 2006
@ 04:28 AM

It seems the Web API authentication discussion has been sparked up all over the Web by the various announcements of Windows Live ID and the Google Account Authentication for Web apps . In his blog post Google's authentication vs. Microsoft's Live ID Eric Norlin writes

Recent announcements of Google's authentication service have prompted comparisons to Passport, and even gotten to Dick Hardt (of "Identity 2.0" fame) to call it the, "deepening of the identity silo." I'd like to contrast Google's work with Microsoft's recent work around Live ID.

Microsoft's Live ID *is* the old Passport — with a few key changes. Kim Cameron's work around the identity metasystem has driven the concept of InfoCards (now called CardSpace) deep inside of Microsoft. In essence, Kim's idea is that there is a "metasystem" which utilizes WS-Trust to translate tokens, so that all identity systems can interact with each other.

Of extreme importance is the fact that Windows Live ID will support WS-Trust, WS-Federation, CardSpace and ADFS (active directory federation server). This means that A) Windows Live ID can interact with other identity metasystem implementations (Open Source versions, for example); B) that your corporate active directory environment can be federated into Windows Live ID; and C) the closed system that was Passport has now effectively been transformed into an open (standards-based) and transparent system that is Live ID.

Contrast all of this with Google's announcement: create Google account, store user information at Google, get authentication from Google — are we sensing a trend? While Microsoft is now making it easy to interact with other (competing) identity systems, Google is making it nearly impossible. All of which leads one to ask - why?

Perhaps it's because there are now so many old-school Microsoft people at Google? ;)

On a more serious note, I suspect that the Google folks simply didn't think about the federation angle when designing the authentication model for their APIs as opposed to this being some 'evil plot' by Google to create an identity silo.