One of the side effects of working for a large, successful, multinational corporation is that you tend to lose your sense of perspective. For example, take this post from the Official Google blog entitled Cookies: expiring sooner to improve privacy which states

We are committed to an ongoing process to improve our privacy practices, and have recently taken a closer look at the question of cookie privacy. How long should a web site "remember" cookie information in its logs after a user's visit? And when should a cookie expire on your computer? Cookie privacy is both a server and a client issue.

On the server side, we recently announced that we will anonymize our search server logs — including IP addresses and cookie ID numbers — after 18 months.
In the coming months, Google will start issuing our users cookies that will be set to auto-expire after 2 years, while auto-renewing the cookies of active users during this time period. In other words, users who do not return to Google will have their cookies auto-expire after 2 years. Regular Google users will have their cookies auto-renew, so that their preferences are not lost. And, as always, all users will still be able to control their cookies at any time via their browsers.

What’s is interesting in this post is that Google has sidestepped the actual privacy issue that has many people concerned about the amount of knowledge the company has about Internet users. Numerous bloggers such as Nelson MinarShelley Powers and John Dowdell have already pointed how this change doesn't actually change the status quo. In today’s world, Google knows more about most Internet users than their spouse. Thanks to the magic of HTTP cookies Google knows remembers...

You pretty much can't use the Web without running into a Google cookie. So it seems somewhat facetious for Google to claim that if you can avoid using the Internet for two years then they'll forget everything they are storing about you. Oops, actually they don't even claim that. They simply claim that they’ll stop associating your old data with your current usage, if you manage to avoid hitting a Google cookie for two years. 

If Google really wanted to address people's privacy concerns they’d blog about how they plan to use and protect all the data they are collecting about Internet users from all of their services instead of making ineffective token gestures that are specific to one service.      

Now playing: Lil Boosie & Webbie - Wipe Me Down (feat. Foxx)


Disclaimer: This may sound like a rant but it isn't meant to be. In the wise words of Raymond Chen this is meant to highlight problems that are harming the productivity of developers and knowledge workers in today's world. No companies or programs will be named because the intent is not to mock or ridicule. 

This morning I had to rush into work early instead of going to the gym because of two limitations in the software around us.

Problem #1: Collaborative Document Editing

So a bunch are working on a document that is due today. Yesterday I wanted to edit the document but found out I could not because the software claimed someone else was currently editing the document. So I opened it in read-only mode, copied out some data, edited it and then sent my changes in an email to the person who was in charge of the document. As if that wasn’t bad enough…

This morning, as I'm driving to the gym for my morning work out, I glance at my phone to see that I've received mail from several co-workers because it I've "locked" the document and no one can make their changes. When I get to work, I find out that I didn’t close the document within the application and this was the reason none of my co-workers could edit it. Wow.

The notion that only one person at a time can edit a document or that if one is viewing a document, it cannot be edited seems archaic in today’s globally networked world. Why is software taking so long to catch up?

Problem #2: Loosely Coupled XML Web Services

While I was driving to the office I noticed another email from one of the services that integrates with ours via a SOAP-based XML Web Service. As part of the design to handle a news scenario we added a new type that was going to be returned by one of our methods (e.g. imagine that there was a GetFruit() method which used to return apples and oranges which now returns apples, oranges and bananas) . This change was crashing the applications that were invoking our service because they weren’t expecting us to return bananas.

However, the insidious thing is that the failure wasn’t because their application was improperly coded to fail if it saw a fruit it didn’t know, it was because the platform they built on was statically typed. Specifically, the Web Services platform automatically converted the XML to objects by looking at our WSDL file (i.e. the interface definition language which stated up front which types are returned by our service) . So this meant that any time new types were added to our service, our WSDL file would be updated and any application invoking our service which was built on a Web services platform that performed such XML<->object mapping and was statically typed would need to be recompiled. Yes, recompiled.

Now, consider how many potentially different applications that could be accessing our service. What are our choices? Come up with GetFruitEx() or GetFruit2() methods so we don’t break old clients? Go over our web server logs and try to track down every application that has accessed our service? Never introduce new types? 

It’s sad that as an industry we built a technology on an eXtensible Markup Language (XML) and our first instinct was to make it as inflexible as technology that is two decades old which was never meant to scale to a global network like the World Wide Web. 

Software should solve problems, not create new ones which require more technology to fix.

Now playing: Young Jeezy - Bang (feat. T.I. & Lil Scrappy)


Categories: Technology | XML | XML Web Services

July 16, 2007
@ 06:57 PM

Charlie Kindel just announced on the Windows Home Server team blog that the final version of the software has been released to manufacturing (RTM). This means that you'll be able to buy a dedicated home server from Fujitsu-Siemens, Gateway, HP, Iomega, Lacie or Medion in the next few months.

I wonder if that means we'll soon be seeing the following ad on TV?


Categories: Life in the B0rg Cube

I just read two blog posts this morning about big shots in the technology industry who are on Facebook. The posts were Jeff Pulver's Goodbye LinkedIn. Hello Facebook. and Robert Scoble's Why Facebook, why now?

Both posts got me to wondering how many execs from major Internet companies like Google, Yahoo!, and Microsoft could you actually find on Facebook. One of the cool things about the search engine on Facebook is that it shows you what “networks” the people who match the query are members of. Since the person has to have an email address from that domain (either a corporate email address or a college alumni email address) then you can be fairly certain that the profile isn’t fake. Here’s who I found after a few minutes of quick searches, you may have to be a member of Facebook to see the search results

Google Execs on Facebook

  1. Marissa Mayer (view friends): Found via
  2. Adam Bosworth (view friends): Found via
  3. Kai-Fu Lee (view friends): Found via

Yahoo! Execs on Facebook

  1. Brad Garlinghouse (view friends): Found via
  2. Jeff Weiner (view friends): Found via
  3. Qi Lu: (view friends): Found via

Microsoft Execs on Facebook

  1. Ray Ozzie (view friends): Found via
  2. Steven Sinofsky (view friends): Found via
  3. Steve Ballmer (friend's list marked private): Found via

How many of these folks do you think have a profile on MySpace? I think this is definitely another data point which validates danah boyd’s theory on the differences between the user base of both sites.


Categories: Social Software

July 16, 2007
@ 05:08 PM

I dreamt I bought an iPhone. It was one of those dreams that seems so real you wake up thinking it happened. I finally realized it was a dream after I found my old phone plugged into its charger and not a brand new phone.

How weird is that?


Categories: Personal

Yesterday I saw a blog post by John Battelle where he said he turned down a chance to be part of discussion on a television show about whether Facebook is the next Google. This seems like a pretty absurd assertion on the face of it and the reason is snuggled somewhere in the middle of Robert Scoble's post Why Facebook, why now? where he writes

Everything I do in Facebook is about interacting with people. For instance, at the top of my Facebook inbox right now is Ryan Coomer. The advertising next to him says “Try Forex Trading Today.” There is absolutely NO connection between who Ryan is and the advertising that’s put next to him.

Imagine if advertisers could “buy people.” I just clicked on Ryan’s profile, hes into Running and Golf. Why don’t ads for running and golf gear get put onto his profile? Wouldn’t that make sense? He’s also a software developer. Where’s the Visual Studio advertisement? He’s into video games. Where’s the Halo 3 advertisement?

Translation: Facebook needs an advertising platform and it needs one in the worst way. I’m not going to even look at the ads until the ads are tied to the people on Facebook. Facebook knows what we’re into, put ads for those things onto our profiles and messages.

Robert Scoble is both right and wrong. He is right that Facebook needs an advertising platform but he is wrong about how it should be used. If Facebook has detailed demographic data about all its users then it makes the most sense to show the user what they are most interested in and not what the person they are currently interacting with on the site is most interested in. That's counter productive. I hate running and golf, I like G-Unit and alcopops. Which ads does it make more sense to show me if I'm browsing Ryan Coomer's profile?

Anyway, that's besides the point. Let's go back and look at what made Google such a compelling investment when they had their IPO. The company had

  1. An extremely popular website that served a key need for its users. 
  2. A way to make a metric ton of money from their popularity via search ads.

#1 is obvious but #2 needs further clarification. Search advertising created a paradigm shift in the advertising market in several ways. For one, it was the first time in the history of advertising that advertisers had a way to conclusively calculate the ROI of their advertising campaigns compared to other media (TV, newspapers, radio, sponsorships, etc). More importantly, it allowed advertisers to target people when they were in the process of making a buying related decision (i.e. doing research). For the most part advertising is often intrusive, irrelevant and often just plain annoying. On the other hand, ads shown when you are doing research for a commercial transaction are exactly what you want. It is quite telling when you look at highest paying search terms and see that it is for terms like "mortgage refinance", "dui" and "consolidate loans". When searching for those terms I want to see ads, in fact the more ads the better.  What is most interesting is that even though the majority of searches on the Web are non-commercial, this was still a goldmine which could only get more lucrative as more people got on the Web and advertisers eventually realized that their dollars needed to follow the eyeballs and search advertising was actually an easier expense to justify than all their other ad campaigns.

Now we know what made Google such an attractive investment back in the day which has now turned itself into a major force in the software industry and on the Web. So what does Facebook have?

  1. An extremely popular website that served a key need for its users. 
  2. The hope that way will be able to make a metric ton of money from their popularity via highly targeted ads based on all the rich demographic data they have about their users.

The initial thrust of #2 is that Facebook hasn't figured out how to make money yet. More importantly, if they do figure out how to build an ad platform it will likely be a system that is more complex than Google's Adwords and AdSense combined. When you think about it Adwords is pretty simple, allow people to bid on search terms then show these people's ads when people use those search terms. AdSense is similarly straightforward, use term extraction to convert the content on the page to the equivalent of search terms and show ads from the people who bid on ads. Of course, the devil is in the details and is mainly about optimizing how much money can be extracted from their advertisers.

On the other hand, if you want to show display ads based on demographic data you'll also need to build something very similar to a recommendation system as well as everything else I've described because you want to know that if I like wearing G-Unit clothing that I probably wouldn't mind seeing ads for Ecko Unlimited wear either or that if I like music by Young Jeezy that I probably would like music from T.I. or that if I've read Freakonomics I probably wouldn't mind reading The Tipping Point.

Now let's assume that the developers at Facebook figure out how to build an advertising platform that is more complex than Google's Adwords and AdSense combined. There is still the question of whether the "spray and pray" approach that is the hallmark of display advertising [even if you are sure that the ads will be about the topics the user is interested in]  is a more lucrative model than showing people ads when they are actually looking for something via search ads. I personally don't think it will be but I'm no expert and I'm definitely no fortune teller.

I suspect that the folks at Facebook will eventually realize how hard a problem it is to monetize their users in a way that justifies the exorbitant valuation that has now been placed on them. When that time comes, I wouldn't be surprised if they find some sucker to buy them based on the promise that they present as opposed to the value that they actually bring. Almost like when Yahoo! paid $5 billion for which gave us the Mark Cuban we all know and love but also ended up as one of the top five worst billion dollar acquisitions of all time in the Internet space.


I should probably start out by pointing out that the title of this post is a lie. By definition, RESTful protocols can not be truly SQL-like because they depend on Uniform Resource Identifiers (URIs aka URLs) for identifying resources. URIs on the Web are really just URLs and URLs are really just hierarchical paths to a particular resource similar to the paths on your local file system (e.g. /users/mark/bobapples, A:\Temp\car.jpeg). Fundamentally URIs identify a single resource or aset of resources. On the other hand, SQL is primarily about dealing with relational data which meansyou write queries that span multiple tables (i.e. resources). A syntax for addressing single resources (i.e. URLs/URIs) is fundamentally incompatible with a query language that operates over multiple resources. This was one ofthe primary reasons the W3C created XQuery even though we already had XPath.

That said, being able to perform sorting, filtering, and aggregate operations over a single set of resources via a URI is extremely useful and is a fundamental aspect of the Web today. As Sam Ruby points out in his blog post Etymology, a search results page is fundamentally RESTful even though its URI identifies a query as opposed to a specific resource or set of resources [although you could get meta and say it identifies the set of resources that meet your search criteria].

Both Google's Google Base data API and Microsoft's Project Astoria are RESTful protocols for performing sorting, filtering and aggregate operations similar to what you find in SQL over a hierarchical set of resources. What follows is an overview of the approaches taken by both protocols.

Filtering Results using Predicates (include supported operators and Google's full text option)

Although Astoria provides an abstraction over relational data, it does so in a way that supports the hierarchical nature of HTTP URIs. The primary resource also known as an entity set is placed at the root of the hierarchy (e.g. the set of allmy customers) and each relationship to another set of resources is treated as anotherlevel in the hierarchy (e.g. each customer's orders). Each step in the hierarchy can be filtered using a predicate. Below are some query URLs and the results they return

Results:All content areas in the Encarta online encyclopedia
Query:[name eq 'Geography']
Results:The content area whose name is 'Geography'
Query:[name eq 'Geography']/Articles
Results:All articles for the content area whose name is 'Geography'
Query:[name eq 'Geography']/Articles[Title eq 'Zimbabwe']
Results:The article with the title 'Zimbabwe' from the 'Geography' content area
Query:[name eq 'Geography']/Articles[761569370]
Results:The article from the 'Geography' content area with the ID 761569370.
Query:[City eq 'London']/Orders[Freight gteq 100]
Results:All orders shipped to customers from the city of London who paid $100 or more in freight charges

Google Base does not treat the data within it as a hierarchy. Instead filters/predicates can be applied to one of two Atom feeds; and which represent all the items within Google Base and all the items a specific user has stored within Google Base respectively. The latter URL requires the HTTP request to be authenticated.

The first way one can filter results from a Google Base feed is by placing one or more categories as part of the path component. For example

Results:All items from the 'hotels' category within Google Base
Results:All items from the 'jobs' or the 'personals' category within Google Base
Results:All items in Google Base except those from the 'recipes' category

The second way is to filter results in a Google Base feed is by performing a full text query using the q query parameter. For example,

Results:All items within Google Base that contain the string 'atlanta' in one of their fields
Results:All items from the 'hotels' category within Google Base that contain the string 'atlanta' in any of their fields
Results:All items from the 'hotels' or 'housing' categories within Google Base that contain the string 'seattle' or 'atlanta' in any of their fields
Results:All items from the 'hotels' category within Google Base that contain the string 'washington' but not the string 'dc'

The final way to filter results from Google Base feed is by applying a predicate on a field of the item using the bq query parameter. For example

Results:All items from the 'hotels' category that have 'seattle' in their location field
Results:All items from the 'hotels' category that have 'seattle' in their location field and 'ramada' in any of their other fields
Query:[location:@"1 Microsoft Way, Redmond, WA, USA" + 5mi]
Results:All items from the 'hotels' category whose location is within 5 miles of "1 Microsoft Way, Redmond, WA, USA"
Query:[price(float USD)>=250.0 USD]
Results:All items from the 'products' category whose price is greater than $250.00 and have 'zune' in one of their fields

Supported Datatypes, Operators and Functions

As can be seen from the previous examples, both the Google Base data API and Astoria support operations on fields searching for string matches. Astoria supports the major SQL datatypes, a list of which can be obtained from the table describing the System.Data.SqlTypes namespace in the .NET Frameworks. The operations that can be performed on the various fields of an entity are the following comparisons

neNot equal
gtGreater than
gteqGreater than or equal
ltLess than
lteqLess than or equal

The list of datatypes supported by Google Base is provided in the Google Base data API documentation topic on Attribute Types. In addition to the comparison operators supported by Astoria, the Google Base data API also supports

@"..." + XmiConvert string in quotes to a geocoded location and match anything that is within a radius of X miles/kilometers/meters around it depending on the unit of distance specified
name(type):X..YTest whether the value of the named attribute [and optional type] falls between X and Y (e.g. [event date range:2007-05-20..2007-05-25] matches all events which fall between both dates)
date-range << date-rangeTest if the date on the right hand side is a subset of the date range on the left hand side
if boolean-expression then expression else expressionWorks like an if...else statement in every programming language you've ever used.

In addition to these operators, it turns out that the Google Base data API also support a full blown expression language for use within predicates. This includes a library of over 20 functions from math functions like sin and cos to aggregation functions like sum and count as well as more esoteric functions like dist, exists and join. Below are some queries which use these operators and functions in action

Query:[item type:vehicles][location(location)]
Results:All vehicles whose listing contain the text 'sale' and orders results by those that are geographically closest to the city of Seattle, WA
Query:[event date range:2007-05-20..2007-05-25]
Results:All events that fall between May 20th 2007 and May 25th 2007


Sorting query results is often necessary when working with large amounts of data. Both Google Base data API and Astoria provide a way to indicate that the results should be sorted based on one or more fields. In Astoria, sorting is done using the $orderby query parameter. For example,

Results:All areas in the Encarta encyclopedia sorted alphabetically by their Name
Query:$orderby=OrderDate desc
Results:All customer orders sorted in descending order by order date
Results:All customer orders sorted by the required date and the cost of freight

The Google Base data API uses the orderby and sortorder query parameters to control sorting and sort order respectively. Examples are shown below

Results:All job listings containing the string 'program manager' sorted by the salary field
Query:[item type:vehicles][location(location)]
Results:All vehicles whose listing contain the text 'sale' and orders results by those that are geographically closest to the city of Seattle, WA
Query:[location:@"1 Microsoft Way, Redmond, WA, USA" + 5mi]&orderby=[x=bedrooms(int): if exists(x) then max(x) else 0]
Results:All items within the 'housing' category that are within 5 miles of Microsoft's headquarters sorted by number of bedrooms. For items that don't have a bedrooms element use the value 0 when sorting


When dealing with large numbers of items, it often isn't feasible to return all of them in a single XML document for a variety of reasons. Both the Google Base data API and Astoria provide mechanisms to retrieve results as multiple "pages".

In Astoria, this is done using a combination of the top and skip query parameters which indicate the number of items to return and what item to start the list from respectively. Examples below

Results:All areas in the Encarta encyclopedia sorted alphabetically by their Name, restricted to only showing 3 items per page
Results:All areas in the Encarta encyclopedia sorted alphabetically by their Name starting from the second item, restricted to only showing 3 items per page

The Google Base data API uses the max-results and start-index query parameters to indicate the number of items to return and what item to start the list from respectively. The default value of max-results is 25 while its maximum value is 250. The total number of results is emitted in the returned feed as the element openSearch:totalResults. Examples below

Results:All hotels within the seattle area, restricted to 10 results per page
Results:All hotels within the seattle area, restricted to 50 results per page starting from the hundredth result

Astoria Specific Features

Using links within items the describe relationships is a core aspect of a RESTful protocol and is utilized by Astoria to show the foreign key relationships between rows/entities in the data base. However it can be cumbersome to have to make multiple requests and follow every link to get all the content related to an item. For this reason, Astoria includes the $expand query parameter which automatically follows the links and retrieves the XML inline. Compare the following queries

Query:[Name eq 'History']/Articles[Title eq 'Civil War, American']
Results:The encyclopedia article on the American Civil War which has links to its Area, ArticleBody, Notes and RelatedArticles
Query:[Name eq 'History']/Articles[Title eq 'Civil War, American']?$expand=ArticleBody,Notes,RelatedArticles
Results:The encyclopedia article on the American Civil War with its Area, ArticleBody and Notes shown inline as XML elements

GData Specific Features

Google Base has a notion of adjusted query results. When this feature is enabled, Google Base will automatically use spelling correction, stemming and other tricks to try and match results. For example, if you perform a search for the value "female" in the gender field of an item, the query adjustment engine will know to also match the value "f" in the gender field of any corresponding items. The query adjustment engine applies its heuristics on queries for field names, field values and item categories. The documentation is contradictory as to whether this feature is enabled by default or has to be specifically enabled by the user of the API.

Another interesting feature, is that the Google Base data API allows one to filter out repetitive results using a feature called "crowding". With this feature, limits can be placed on how many results that match a certain criteria should be returned. See the following examples for details

Results:Return all restuarants stored within Google Base but show no more than 2 per cusine type
NOTE: I actually couldn't get this feature to work using either the example queries from the documentation or queries I constructed. It is quite possible that this feature doesn't work but is so esoteric that no one has noticed.


In comparing both approaches there is a lot to like and dislike. I like the "expand" feature in Astoria as well as the fact that I can retrieve XML results from multiple paths of the hierarchy. However there does seem to be a paucity of operators and functions for better filtering of results.

From the Google Base data API, I love the "crowd" feature and having a full library of functions for performing tests within predicates. Also some of the operators such as the ones for finding results near a certain location are quite impressive although unnecessary for the majority of RESTful protocols out there. That said, I do think they went overboard on some of the features such as having if...else blocks within the URIs. I suspect that some of that complexity wouldn't have been needed if they just had hierarchies instead of a flat namespace that requires complex filtering to get anything out of it.


From the Virtual Earth team's blog post entitled Mobile Search V2 released - Improved Navigation, Cache, Movie Searching, GPS, Traffic reporting and more! we learn

The Mobile Search team has released V2 of the rich client application for Windows Mobile, as well as a major update to the browser based interface. Whether you have a J2ME (Java) phone, Windows Mobile phone, or any other device with a mobile browser, Live Search has you covered with maps, directions and business search.
  • Movie Showtimes:  Want to see a movie but don’t know what’s playing?   Get Movie Showtimes near you and be on your way! 
  • More Local Data with Reviews:  Want to go out for dinner but not sure which restaurant to pick?  Let your fellow restaurant goers help you out – make a decision based on user ratings.
  • Maps:  View Mobile Virtual Earth maps wherever you are.  For improved performance, pop in your storage card to enable the large cache option. Street maps, Aerial and Hybrid are supported.
  • Directions: Lost?  Get found with better support for GPS integration and improved turn-by-turn navigation.  We’ll even prompt you to auto-reroute if you get lost! 
I use this all the time and it is now my favorite Windows Live application. It's a sign that we've come a long way that I now consider printing out driving directions from the Web to be "old school".

I've been using Yahoo's mobile service for movie times but now it looks like I be needing it any more. Sweet. What are you waiting for? Head over to and install it on your phone today.


Categories: Windows Live

July 12, 2007
@ 05:32 PM

Before starting this review I should probably point out that I'm a Transformers fan. Every car I've ever owned has either had an autobot or decepticon insignia on the bumper and I have most of the seasons for the original TV show on DVD. That said, when I first saw the trailers for Transformers I expected the movie to suck because it seemed like there were too many people and not enough robots, boy was I wrong. The movie freaking rocked. The movie had it all; comedy (the kid's parents being typical annoying nosy parents), romance (nerd meets hot girl and impresses her with his cool car), action (soldiers shooting guns, giant robots shooting guns and Optimus Prime vs. Megatron), tragedy (***spoiler deleted***) , product placement (GMC trucks, eBay1), quotes from the cartoon ("Autobots...roll out!" ,"Megatron must be matter cost!", "You've failed me yet again, Starscream") and foreshadowing of the sequel. At two and a half hours I thought the movie would be too long but it didn't seem that way at all. I loved it, my fiancée loved it and so did the kids. Definitely the most fun movie I've seen this summer. 

I saw it on Tuesday evening and the theater was packed. This may have been because of the current heat wave forcing people out of their homes and into any air-conditioned building they could find or it could have been because the favorable reviews have gotten people curious about the movie. I suspect it was a little of both.

Rating: **** out of *****

1Is it me or does it seem like there is a crazy amount of product placement from online services these days? In the past month I've seen eBay featured prominently in movies like 40-year Old Virgin and Transformers. Songs about meeting girls which heavily feature MySpace from mainstream acts like Gym Class Heroes and T-Pain. I've seen Google used repeatedly in movies like The Holiday and Knocked Up...even crazier was when Spider-Man unmasked himself as Peter Parker at a press conference in The Amazing Spider-Man issue #533 then in the very next panel there was Google product placement. The only Microsoft mentions I've seen in pop culture were some teens playing Gears of War in Live Free or Die Hard but the name of the game or the XBox are never mentioned. Our marketing guys have some 'splaining to do.


Categories: Movie Review