June 11, 2006
@ 08:58 PM

Robert Scoble has posted an entry entitled Correcting the Record about Microsoft where he confirms statements by Dave Winer and Chris Pirillo that he is leaving Microsoft for PodTech.

This seems like a good move for Robert. He's always seemed more at home with his role as an amateur technology journalist than as the unofficial spokesman for Microsoft. Congrats to the PodTech folks for making a great hire and good luck to Robert on his new adventure. He definitely will be missed over here in the B0rg Cube.


 

Categories: Life in the B0rg Cube

I have a Dell laptop which was recently purchased and thus as a consequence of the Google<->Dell deal comes with a bunch of Google utilities installed such as the Google Toolbar and Google Desktop. In addition, it had http://www.google.com/ig/dell as the default home page. One thing I've noticed is every once in a while when I type a URL in the browser address bar, instead of going the web page I get search ads instead. Below is a screenshot of what happened when I typed http://www.apartments.com/ in the address bar recently.




Weird huh?

 

Categories: Ramblings

June 9, 2006
@ 08:06 PM

Windows Live Dev is now live. Wondering what it is? Check out the answer at What is Dev?

Windows Live Dev is your one-stop shop for the Windows Live Platform, including information on getting started with Windows Live services, latest documentation and APIs, samples, access to community areas and relevant blogs, and announcements of future releases and innovations.

Windows Live Dev is a new site and will be growing over time, adding more content and features. Please, let us know what you think of the site and what you’d like to see in the future. Post to our “Chatter” forum and start a conversation about what you’d like to see. We’ll see you there.

About the Windows Live Platform

The Windows Live Platform puts a deeper level of control into developers' hands by offering access to the core services and data through open, easily accessible APIs. Now you can build applications and mashups that combine your innovation with the power of Windows Live services and social relationships.

This is awesome. This is the third developer website I've been a part of getting started at Microsoft (http://msdn.microsoft.com/xml and http://msdn.microsoft.com/msn are the others) and yet I can never get over how great it feels to see something go from an idea on a whiteboard to reality. I wanted the URL of the site to be http://developer.live.com but Brian (our VP) suggested http://dev.live.com which definitely has a better ring to it. Check it out and let us know what you think. 

I'll be doing the group blog thing at http://dev.live.com/spaces. Expect interesting new API announcements in the coming months and perhaps even a peep or two about the rumored support for gadgets coming to Windows Live Spaces.


 

Categories: Windows Live

Tim O'Reilly ran a series of blog posts a few months ago on the O'Reilly Radar blog entitled "Database War Stories" where he had various folks from Web companies talk about their issues scaling databases. I thought the series of posts had an interesting collection of anecdotes and thus I'm posting this here so I have a handy link to the posts as well as the highlights from each entry.

  1. Web 2.0 and Databases Part 1: Second Life: Like everybody else, we started with One Database All Hail The Central Database, and have subsequently been forced into clustering. However, we've eschewed any of the general purpose cluster technologies (mysql cluster, various replication schemes) in favor of explicit data partitioning. So, we still have a central db that keeps track of where to find what data (per-user, for instance), and N additional dbs that do the heavy lifting. Our feeling is that this is ultimately far more scalable than black-box clustering.

  2. Database War Stories #2: bloglines and memeorandum: Bloglines has several data stores, only a couple of which are managed by "traditional" database tools (which in our case is Sleepycat). User information, including email address, password, and subscription data, is stored in one database. Feed information, including the name of the feed, description of the feed, and the various URLs associated with feed, are stored in another database. The vast majority of data within Bloglines however, the 1.4 billion blog posts we've archived since we went on-line, are stored in a data storage system that we wrote ourselves. This system is based on flat files that are replicated across multiple machines, somewhat like the system outlined in the Google File System paper, but much more specific to just our application. To round things out, we make extensive use of memcached to try to keep as much data in memory as possible to keep performance as snappy as possible.

  3. Database War Stories #3: Flickr: tags are an interesting one. lots of the 'web 2.0' feature set doesn't fit well with traditional normalised db schema design. denormalization (or heavy caching) is the only way to generate a tag cloud in milliseconds for hundereds of millions of tags. you can cache stuff that's slow to generate, but if it's so expensive to generate that you can't ever regenerate that view without pegging a whole database server then it's not going to work (or you need dedicated servers to generate those views - some of our data views are calculated offline by dedicated processing clusters which save the results into mysql).

  4. Database War Stories #4: NASA World Wind: Flat files are used for quick response on the client side, while on the server side, SQL databases store both imagery (and soon to come, vector files.) However, he admits that "using file stores, especially when a large number of files are present (millions) has proven to be fairly inconsistent across multiple OS and hardware platforms."

  5. Database War Stories #5: craigslist: databases are good at doing some of the heavy lifting, go sort this, give me some of that, but if your database gets hot you are in a world of trouble so make sure can cache stuff up front. Protect your db!

    you can only go so deep with master -> slave configuration at some point you're gonna need to break your data over several clusters. Craigslist will do this with our classified data sometime this year.

    Do Not expect FullText indexing to work on a very large table.

  6. Database War Stories #6: O'Reilly Research: The lessons:

    • the need to pay attention to how data is organized to address performance issues, to make the data understandable, to make queries reliable (i.e., getting consistent results), and to identify data quality issues.
    • when you have a lot of data, partitioning, usually by time, can make the data usable. Be thoughtful about your partitions; you may find its best to make asymmetrical partitions that reflect how users most access the data. Also, if you don't write automated scripts to maintain your partitions, performance can deteriorate over time.
  7. Database War Stories #7: Google File System and BigTable: Jeff wrote back briefly about BigTable: "Interesting discussion. I don't have much to add. I've been working with a number of other people here at Google on building a large-scale storage system for structured and semi-structured data called BigTable. It's designed to scale to hundreds or thousands of machines, and to make it easy to add more machines the system and automatically start taking advantage of those resources without any reconfiguration. We don't have anything published about it yet, but there's a public talk about BigTable that I gave at University of Washington last November available on the web (try some searches for bigtable or view the talk)."

  8. Database War Stories #8: Findory and Amazon: On Findory, our traffic and crawl is much smaller than sites like Bloglines, but, even at our size, the system needs to be carefully architected to be able to rapidly serve up fully personalized pages for each user that change immediately after each new article is read. Our read-only databases are flat files -- Berkeley DB to be specific -- and are replicated out using our own replication management tools to our webservers. This strategy gives us extremely fast access from the local filesystem. We make thousands of random accesses to this read-only data on each page serve; Berkeley DB offers the performance necessary to be able to still serve our personalized pages rapidly under this load. Our much smaller read-write data set, which includes information like each user's reading history, is stored in MySQL. MySQL MyISAM works very well for this type of non-critical data since speed is the primary concern and more sophisticated transactional support is not important.

  9. Database War Stories #9 (finis): Brian Aker of MySQL Responds: Brian Aker of MySQL sent me a few email comments about this whole "war stories" thread, which I reproduce here. Highlight -- he says: "Reading through the comments you got on your blog entry, these users are hitting on the same design patterns. There are very common design patterns for how to scale a database, and few sites really turn out to be all that original. Everyone arrives at certain truths, flat files with multiple dimensions don't scale, you will need to partition your data in some manner, and in the end caching is a requirement."

    I agree about the common design patterns, but I didn't hear that flat files don't scale. What I heard is that some very big sites are saying that traditional databases don't scale, and that the evolution isn't from flat files to SQL databases, but from flat files to sophisticated custom file systems. Brian acknowledges that SQL vendors haven't solved the problem, but doesn't seem to think that anyone else has either.

I found most of the stories to be interesting especially the one from the Flickr folks. Based on some early thinking I did around tagging related scenarios for MSN Spaces I'd long since assumed that you'd have to throw out everything you learned in database class at school to build anything truly performant. It's good to see that confirmed by more experienced folks.

I'd have loved to share some of the data we have around the storage infrastructure that handles over 2.5 billion photos for MSN Spaces and over 400 million contact lists with over 8 billion contacts for Hotmail and MSN Messenger. Too bad the series is over. Of course, I probably wouldn't have gotten the OK from PR to share the info anyway. :)


 

The MSN Spaces team's blog has a few entries about one of the projects they've been working on in collaboration with our team. From the posts URL Changes and More info on the URL Changes we learn

All MSN Spaces Users:

Please note that your MSN Space's URL will change on June 5 8, 2006.  As part of investments in the improvement of MSN Spaces, it we will be migrating all of the URLs from http://spaces.msn.com/<NAME> to http://<NAME>.spaces.msn.com.   (For instance, instead of http://spaces.msn.com/thespacecraft/  you will now see http://thespacecraft.spaces.msn.com.)

 On and after June 5th 8th, all viewers and users going to the "old" URL will be automatically redirected to the new URL.
...
Spaces has grown very quickly into one of the Web’s mega services. So quickly in fact that we just passed the 100MM user mark and have had to do some architectural changes to ensure that Spaces can be deployed in multiple data centers. We needed to deliver a system that allows for Spaces to be distributed across multiple data centers without requiring a URL that included the data center name. How unkewl would that be? Can you imagine telling your friends and family that the URL to your space was http://cluster25.dc1.spaces.msn.com/gphipps?

So we have developed a DNS (Domain Name System) based solution that allows us to redirect requests to the right data center and allows us to keep a better looking URL. Moving the Space name into the domain name is a requirement of that.
...
Doing the rearchitecture work and making the move to Live Spaces was not possible for a number of technical reasons. This is why we can’t move straight to the spaces.live.com name. However, we believe that when we do move to Live Spaces that will be the last time we have to change the URL. This really isn’t something we decided to do lightly. We have had to make a ton of tradeoffs from both a technical perspective and the impact to our users.

Converting a service as large as MSN Spaces and it's associated services from a single data center to be able to be deployed in multiple data centers has been a significant undertaking. One unfortunate side effect is that we've had to alter the URL structure of MSN Spaces. Doubly unfortunate is that the URL structure will change again with the switch to Windows Live Spaces.

Although these changes suck, they are necessary to ensure that we can continue to handle the explosive growth of the service across the world as well as pump out crazy new features. Thanks to all our users who have to bear with these changes. 


 

Categories: Windows Live

Samir from the Windows Live Expo team has posted an entry entitled Attention Developers: Expo API RTW where he writes

The first version of the Expo API went live this week.  The API gives developers read access to the Expo classifieds listings database.  All our listings are geo-tagged, so there are some cool posibilities for mashups with some of those mapping APIs.
 
The API docs are published on MSDN and you can find them here.  You can get started by signing up for an Application key at http://expo.live.com/myapikeys.aspx.
 
If you've got a slick working demo using our APIs, please let us know at expoapi@microsoft.com so we can link to it.  The nuttier and more creative, the better!  Also, make sure you post a link from this blog post.

I took a gander at the API before I shipped and there are a number of things I like about it. The first thing I like is that the API has both a RESTful interface and a SOAP interface. Even better is that the RESTful interface is powered by RSS with a couple of extensions thrown in. This API and others like it that we have planned is one of the reasons I've been thinking about Best Practices for Extending RSS and Atom. On the other hand, I don't like that the API takes latitude/longitudes instead of addresses and zip codes. This means if I want to write an app that uses the Expo API, I need to also use a geocoding API. This may make it easier to integrate Windows Live Expo into Map-based mashups though. I've asked Samir about also accepting addresses and zip code support in the API but I suspect the team will need to get the request from a couple more folks before they take it seriously. :(

I've promised the Expo folks that I'll write a Live.com gadget using the API which I hope to get started on soon. Given that I've also promised the folks at O'Reilly's XML.com an article on building Live.com gadgets by next week, I better get cracking. It's a good thing I have a copious amount of free time. ;)


 

Categories: Windows Live

Joe Wilcox of Jupiter Research has a blog post entitled Google My Spreadsheet where he talks about Google's recently announced Web-based spreadsheet application. He writes

So, Google is doing a spreadsheet. On the Web. With collaboration. And presumably--if and when released--Google Spreadsheet will be available for free. I predict there will be crisis meetings at Microsoft today. I totally agree with Colleague David Card that "Google is just playing with Microsoft's (hive) mind. Scaring the troops. Sleight-of-handing the managers."

Perhaps the real Google competitive threat isn't any product or products, but the information vendor's ability to rustle Microsoft corporate paranoia. To get Microsoft chasing Google phantoms, and in the process get Microsoft's corporate mind off its core business. News media will be gaga with Google competing with Microsoft stories--two juggernauts set to collide. Yeah, right. I'd be shocked if Google ever released a Web browser, operating system or desktop productivity suite. Those markets aren't core to Google's business, contrary to speculation among news sites and bloggers.

As for the spreadsheet, which isn't really available beyond select testers, what's it really going to do? David is right, "Consumers don't use spreadsheets. No thinking IT manager would sign off on replacing Excel with a Web-based spreadsheet." Google's target market, if there really is one, appears to be casual consumer and small business users of spreadsheets--people making lists. OK, that competes with Microsoft how? So soccer mom or jill high schooler can work together with other people from the same based Web-based spreadsheet. Microsoft shouldn't really sweat that, although Microsoft might want to worry about what Google might do with extending search's utility.

I agree 100% with Joe Wilcox's analysis here. This seems more like a move by Google to punk Microsoft into diverting focus from core efforts than a product category that is well thought out. I thought at the recent Google Press Day, Eric Schmidt mentioned that they have not been doing a good job of following the 70/20/10 principle (70 percent focus on core business, 20 percent on related businesses and 10 percent on new businesses).

If I was a Google investor, I'd be worried about the fact that their search engine relevance is detoriorating (Google Search for "msdn system.string" doesn't find this page in the top 10 results) and they are wasting resources fragmenting their focus in so many myriad ways. As a competitor, it makes me smile. ;) 

Update: From the Google blog post entitled It's nice to share it looks like this is an offerings from the creators of XL2Web who were acquired by Google a year ago. Another example of innovation by acquisition by Google? Interesting.

 

Guy Kawasaki has a blog post entitled The Top Sixteen Lies of CEOs which has the following item on the list

2. “It’s like a startup around here.” This could mean that the place lacks adult supervision; capital is running out; the product is behind schedule; investors have given up, and employees are paid below market rates. Sure, it could alternatively mean that the company is energized, entrepreneurial, making meaning, and kicking butt, but just be sure to double check.

One thing that amuses me at work is that every once in a while I see some internal team on a recruitment drive and one of the selling points is that "working for us is like working at a startup". The seems pretty amusing on the face of it given that most of the characteristics one associates with working at a startup are negative as Guy suggests. Paul Graham pointed out in his essay How to Make Wealth that the main attraction of working at a startup is that you can get really rich working at it. That makes getting underpaid, overworked and in constant fear of your competitors worth it at the end.

Unless your team can guarantee folks a huge chunk of change if your product is successful then working there is not like working at a startup. There are lots of ways to describe how cool your workplace is without resorting to the flawed comparison to working at a startup.   

Thanks for listening.
 

Categories: Life in the B0rg Cube

Seeing Jon Udell's post about having difficulty with the Google PR team with regards to discussing the Google GData API reminded me that I needed to write down some of my thoughts on extending RSS and Atom based on looking at GData. There are basically three approaches one can take when deciding to extend an XML syndication format such as RSS or Atom

  1. Add extension elements in a different namespace: This is the traditional approach to extending RSS and it involves adding new elements as children of the item or atom:entry element which carry application/context specific data beyond that provided by the RSS/Atom elements. Microsoft's Simple Sharing Extensions, Apple's iTunes RSS extensions, Yahoo's Media RSS extensions and Google's GData common elements all follow this model.

  2. Provide links to alternate documents/formats as payload: This approach involves providing links to additional data or metadata from an item in the feed. Podcasting is the canonical example of this technique. One argument for this approach is that instead of coming up with extension elements that replicate existing file formats, one should simply embed links to files in the appropriate formats. This argument has been used in various discussions on syndicating calendar information (i.e. iCalendar payloads) and contact lists (i.e. vCard payloads). See James Snell's post Notes: Atom and the Google Data API for more on this topic.

  3. Embed microformats in [X]HTML content: A microformat is structured data embedded within another markup language (typically HTML/XHTML). This allows one to represent both human-readable data and machine-readable data in a single document. The Structured Blogging initiative is an example of this technique.

All three approaches have their pros and cons. Option #1 is problematic because it encourages a proliferation of duplicative extensions and may lead to fragmenting the embedded data into multiple unrelated elements instead of a single document/format. Option #2 requires RSS/Atom clients to either build parsers for non-syndication formats or rely on external libraries for consuming information in the feed. The problem with Option #3 above is that it introduces a dependency on an HTML/XHTML parser for extracting the embedded data from the content of the feed.

From my experience with RSS Bandit, I have a preference for Option #1 although there is a certain architectural purity with Option #2 which appeals to me. What do the XML syndication geeks in the audience think about this?


 

The Windows Live Mail Desktop team have a blog post entitled Better Together with Active Search where they talk about a new feature currently being called "Active Search". The post is excerpted below

Much of what you need to get done online – from planning your next vacation to remembering to buy flowers for your mom on her birthday – is piling up in your inbox, just waiting for you to take action, usually by looking something up on the web.

With this in mind, we’ve designed Active Search to make it easier for you to act on anything that piques your interest while reading your email.That’s why we show you key search terms we find in a message and provide a search box right underneath, so you can quickly search for terms of your own.

We also show search results and sponsored links right inline, so you can see what’s related to your message on the web, without having to open a new browser window. Of course, if you come across something really interesting, just click More results… and we’ll open a new window with a full set of search results for you to dive into.

Because we only look for relevant keywords in the current email message or RSS article you happen to be viewing in your inbox, there are times when we just can’t find anything relevant enough to show you. So we don’t – we just show a search box ready for you to enter search terms you happen to come up with while reading the message.

I got a demo of this feature from Bubba in the cafeteria a few weeks ago and it seemed pretty interesting. It reminds me of text ads in GMail but for a desktop application and a few other key differences. What I'd love to know is whether there is a plan to make some of this stuff available as APIs for non-Windows Live applications. I wouldn't mind being able to integrate search ads into RSS Bandit to offset some of our hosting costs.


 

Categories: Windows Live