Dare Obasanjo's weblog

Facebook, MySpace, Google, Plaxo & Marc Canter on Activity Streams at MIX 09 Conference

Our mystery panelist has been unveiled.

Standards for Aggregating Activity Feeds and Social Aggregation Services MIX09-T28F

Thursday March 19 |2:30 PM-3:45 PM | Lando 4201

By: Marc Canter, Monica Keller, Kevin Marks, John McCrea, Dare Obasanjo, Luke Shepard Tags: Services

Come hear a broad panel discussion about aggregating social feeds and services from leading people and companies in this rapidly evolving area including Dare Obasanjo from Microsoft as panel moderator, John McCrea from Plaxo, Kevin Marks from Google, Luke Shepard from Facebook, Marc Canter from Broadband Mechanics, and Monica Keller from MySpace.

You can read my previous post to learn more about what you can expect will be discussed at the panel.

Note Now Playing: Metallica - ...And Justice For All Note

Categories: Social Software | Trip Report

March 17, 2009

@ 01:41 PM

My MIX 09 Panel: Standards for Aggregating Activity Feeds and Social Aggregation Services

Details of my upcoming panel at Microsoft's MIX 09 conference are below.

Standards for Aggregating Activity Feeds and Social Aggregation Services MIX09-T28F

Thursday March 19 |2:30 PM-3:45 PM | Lando 4204 (might be changing to 4201)

By: Marc Canter, Monica Keller, Kevin Marks, John McCrea, Dare Obasanjo Tags: Services

Come hear a broad panel discussion about aggregating social feeds and services from leading people and companies in this rapidly evolving area including Dare Obasanjo from Microsoft as panel moderator, Kevin Marks from Google, Monica Keller from MySpace, Marc Canter from Broadband Mechanics, and John McCrea from Plaxo.

News feeds and activity streams as they pertain to social networks is a pretty hot topic given the rise of services like Twitter and Facebook along with their attendant ecosystems. As more and more services provide activity streams as APIs and we see them showing up across different web sites such as FriendFeed and on the desktop (e.g. Seesmic for Facebook, Tweetdeck's new Facebook integration, etc), it is a great time to talk about whether we need standards in this space and what they should look like.

There is also interesting food for thought as to whether we have reached the state of the art in this space or whether there is still more innovation to be seen and also what form it could potentially take. Most importantly, as social networks start propagating activity across each other (e.g. using Twitter to update Facebook status, sharing Flickr activities with my Windows Live Messenger buddies, etc) is this the beginning of the much heralded dream of social network interoperability or are we still far away from making this a reality?

PS: We might have a surprise fifth panelist. I'm still working on the details and will update this post if the person can make it.

Note Now Playing: Mystikal - Unpredictable Note

Categories: Social Software | Trip Report

March 15, 2009

@ 03:42 PM

Some thoughts on news feeds and activity streams replacing email

Yesterday I was part of the Feed Me: Bite Size Info for a Hungry Internet panel at the SXSW conference. I've been interested to see what the key take away would be for people that attended the panel and it seems the votes are in. The BBC report on the panel is titled Social networks 'are new e-mail' which is extremely similar to the coverage in the VentureBeat report titled Is the social stream the new email?

Although the "X will kill Y" spin is always good for spicing up a technology story there is an interesting trend that is worth watching. For many people, email serves multiple purposes. It is a way to have conversations and also a way for them to share information. However activity streams such as the Facebook news feed, Twitter and the what's new feed on Windows Live are creating new ways for people to share things about themselves or that they find interesting with others. Joe Kraus of Google described this as the shift from active to passive sharing.

These days I'm more likely to post an interesting link by sharing it on Twitter and have it filter out to my social networks on Facebook and Windows Live than I am to share it via email and risk spamming a bunch of my friends and coworkers. As more people embrace social networking, the trend of using email for certain types of sharing will likely decline. It is already interesting to note that Perez Hilton's #1 Traffic Source Is Facebook and it would be interesting to see if this is a trend that crosses more categories of sites (e.g. I wonder if YouTube gets more traffic from email services or from social networks).

Where things get interesting is in trying to bridge the gap between active and passive sharing. Sending an email about your latest vacation pics may be too intrusive but you also may not want to just relegate them to being shown in your friends' feeds to be gone from sight if your friends don't log-in in time. David Sacks of Yammer mentions Twitter's @reply feature as one way to bridge this gap. On Twitter, you can push content out into the ether not caring particularly if your friends read it or not but when you do it is just a special character away to send a directed message to them. It will be interesting to see if other services figure out a feature that provides the same functionality as Twitter's @reply and is similarly lightweight.

PS: Food for thought, can you imagine the 25 random things meme on Facebook spreading as effectively if it was an email chain letter?

Note Now Playing: Disturbing Tha Peace - Growing Pains (Do It Again) Note

Categories: Social Software | Trip Report

March 14, 2009

@ 06:55 PM

SXSW Trip Report: The Search for a More Social Web

These are my notes from the The Search for a More Social Web talk by Dave Morin.

I had expected this talk to be about lessons learned by Facebook along the way as they've built the site. It actually was more of a product launch keynote and overview of recent launches.

There was a brief preamble about the history of communications from carrier pigeons and the postal service to telegrams and computers. Computers have become increasingly social from when they were first connected into a major computer network as ARPANET, followed by the creation of the World Wide Web, then email [ed note - I think he has his chronology wrong with this one] and chat. However social communications via computers hasn't really come into its own until recently.

Facebook has been working on the social graph for a few years. With Facebook, people for the first time are sharing their real identities and personal content on the Web in a big way. As time has gone by the nature of Facebook's social graph has changed. What was once a graph of relationships between college kids is now a graph that include coworkers, friends, families and even celebrities.

There are three key pillars in the current Facebook experience

The stream (which used to be called the news feed): A chronologically sorted list of what people in your social network are saying, sharing and doing.
Social everywhere: With Facebook Connect they have now made it possible to bring social to every site on the Web.
The social graph: The graph of connections between users which has been expanded to enable users to connect to everything they care about. The recently announced changes to Facebook Pages now enables brands to participate as nodes in the graph and show up in the stream. Examples of such profiles around celebrities and brands on Facebook include CNN, Barack Obama and U2.

At this point, Dave Morin brought on Gary Vaynerchuck from Wine Library TV to talk about his experiences using the new, improved Facebook Pages to promote his brand and connect with his fans. The first comment Gary made was being glad that he no longer had to put under with the 5000 friend limit if he wanted to be able to participate fully in Facebook. He considers the changes to have turned Facebook into a word-of-mouth marketing machine on steroids. When he posts a video, it shows up in real time in the news feeds of his thousands of fans who can then redistribute it to their social networks if they like or dislike it with the click of a button [ed note – I really love that Facebook explicitly separate the notion of re-share and "like" instead of what FriendFeed does]. With Facebook he has the ability to note only broadcast to his customers but also listen to them by reading the hundreds of comments they leave in response to his videos. He considers this an important paradigm shift in how brands communicate with users and considers Facebook to be the most powerful social media marketing tool around today.

Dave then gave an overview of the four key aspects of the redesigned homepage shown below.

Filters enable you to control what content you see in your stream. The stream is a real-time stream of changes from your social network. The publisher allows you to share your thoughts or interesting content. The highlights section functions like the old news feed by showing you content from people you interact with the most or other notable content from your stream so you don't miss it.

The talk then switched focus to Facebook Connect. Facebook Connect enables all 175 million users of the site to take their Facebook experience to a host of partner sites. There are currently 6000 websites that have implemented Facebook Connect including TechCrunch, Vimeo, Meetup, Geni and Joost. All of these services have mentioned increased engagement from their user base from deploying Facebook Connect. TechCrunch has been getting higher quality comments from people who use the integration since it is attached to their real names. When you combine Joost with Boxee, it is now possible to see social to television by seeing what your friends are watching from your TV. There is also Aardvark social search engine which also will be integrating Facebook Connect [ed note – not clear this last one has actually shipped].

Facebook Connect has not only been used to bring social to the web but also to the desktop as well. Two key examples are Xobni integrating with Facebook Connect to bring social to your desktop email client and Apple iPhoto adding the ability to upload photos directly to Facebook. At this point Dave brought on Loic Le Meur of Seesmic who announced a new desktop client built on Adobe Air for interacting with Facebook now available at http://www.seesmic.com/facebook . You can find out more about this on Loic's blog post about the launch at Seesmic Launches the First Facebook Desktop Client Available Today. Screenshot of the application below. It only does status updates for now but is expected to show richer media types in forthcoming releases.

The question then is what are the next steps for Facebook with regards to their platform. In 2006, they shipped their first set of APIs. In 2007, they shipped the Facebook platform which now has over 50,000 applications. 2008 was the year of Facebook Connect. In 2009, they have done a couple of things thus far. They have opened up APIs to status, notes, links and videos. They are now active in community standards like OpenID and ActivityStreams where they are ably represented by Luke Shepard. They have also contributed to open source projects like memcached.

one more thing…

Dave announced Facebook Connect for the iPhone. He then brought on a number of CEOs of various iPhone application companies to talk about upcoming or just released applications for the iPhone that will integrate with Facebook. The CEO of Playfish which makes three of the top ten games on the Facebook platform and has over 60 million users was the first to speak. They are debuting Who has the biggest brain? for the iPhone. You can challenge your friends on Facebook and see where you rank with them directly from the iPhone. He was followed by the CEO of Social Gaming Network (SGN) which is debuting at least two games with Facebook integration. The first is Agency Wars where users can create a secret agent character then recruit or assassinate their Facebook friends. They are also the makers of iBowl which will be updated with the ability to see when your Facebook friends are online and challenge them to a game of bowling. One of the co-founders of Tapulous was next to talk about their next product Tap Tap Revenge 2 which is shipping with not only 250 new songs but also will use Facebook Connect so that you can challenge your Facebook friends to games in a new split-screen mode. Although they have their own social network it pales in comparison to the highly connected, 175 million user strong social graph on Facebook. Patrick O'Donell from Urbanspoon also spoke about their Facebook integration. Urbanspoon has 1 million restaurant reviews and 4 million iPhone users who have used the "shake" feature of the application over 200 million times. The key integration with Facebook is that restaurant reviews entered via your phone will now show up on your profile in Facebook and in your friends' stream. Joe Greenstein of Flixster was the final iPhone application developer to talk about their Facebook integration. They have 3.4 million users of their application on the iPhone and will now gives those users the ability to integrate their Facebook identity with the application. There are more iPhone applications expected to ship with Facebook integration in the coming weeks including applications from Zynga, Loopt, CitySearch, MTV, Citizen Sports and more.

More details on the Facebook Connect for iPhone announcement can be found on the Facebook official blog post entitled Facebook Connect for iPhone: Friends Now Included.

Note Now Playing: Dr. Dre - Let Me Ride (remix) Note

Categories: Trip Report

March 12, 2009

@ 01:01 PM

My SXSW Schedule

I'll be attending the SXSW interactive conference this weekend and wanted to share my schedule for any of my blog readers who want to meet up and chat after one of the sessions I'll either be attending or participating in.

SATURDAY

10:00 am · Is Privacy Dead or Just Very Confused? (danah boyd on privacy & social networks)

11:30 am · The Search for a More Social Web (Dave Morin from Facebook talking about the industry's search to make the Web more social. The kind of talk I'd have expected to be given by Kevin Marks)

2:00 pm · Opening Remarks: Tony Hsieh (the founder of Zappos.com talks about their legendary reputation as the kinds of customer service)

3:30 pm · Feed Me: Bite Size Info for a Hungry Internet (your opportunity to heckle me on a panel – Facebook, FriendFeed, Yammer and Windows Live folks on the future of feeds in social networks)

5:00 pm · <nothing in this time slot interests me>

6:30 pm · Salon: Friendship is Dead (I always love discussions on whether social networks have devalued the notion of friends or made some of these ties stronger. wonder if they'll talk about Cameron Marlow's research on Maintained relationships in Facebook)

SUNDAY

10:00 am · <nothing in this time slot is in my areas of interest>

11:30 am · OpenID, OAuth, Data Portability and the Enterprise (Joseph Smarr and Kaliya Hamlin are well known for getting identity, authentication and authorization concerns on the Web so it will be interesting to hear their perspectives on what lessons can transfer to the enterprise, if any)

2:00 pm · <I'll spend this time getting my affairs in order since I go straight to the airport after my panel>

3:30 pm · Post Standards: Creating Open Source Specs (another opportunity to heckle me on a panel – two people from Microsoft on a panel on "open source" specs !?!)

I'll try to write up the sessions I attend while on the plane and will post them next week if I get around to it.

Note Now Playing: Geto Boys - My Mind's Playin Tricks On Me Note

Categories: Trip Report

January 12, 2009

@ 02:03 PM

Upcoming Conference Appearance: SXSW '09

It looks like I'll be participating in two panels at the upcoming SXSW Interactive Festival. The descriptions of the panels are below

Feed Me: Bite Size Info for a Hungry Internet

In our fast-paced, information overload society, users are consuming shorter and more frequent content in the form of blogs, feeds and status messages. This panel will look at the social trends, as well as the technologies that makes feed-based communication possible. Led by Ari Steinberg, an engineering manager at Facebook who focuses on the development of News Feed.

Post Standards: Creating Open Source Specs

Many of the most interesting new formats on the web are being developed outside the traditional standards process; Microformats, OpenID, OAuth, OpenSocial, and originally Jabber — four out of five of these popular new specs have been standardized by the IETF, OASIS, or W3C. But real hackers are bringing their implementations to projects ranging from open source apps all the way up to the largest companies in the technology industry. While formal standards bodies still exist, their role is changing as open source communities are able to develop specifications, build working code, and promote it to the world. It isn't that these communities don't see the value in formal standardization, but rather that their needs are different than what formal standards bodies have traditionally offered. They care about ensuring that their technologies are freely implementable and are built and used by a diverse community where anyone can participate based on merit and not dollars. At OSCON last year, the Open Web Foundation was announced to create a new style of organization that helps these communities develop open specifications for the web. This panel brings together community leaders from these technologies to discuss the "why" behind the Open Web Foundation and how they see standards bodies needing to evolve to match lightweight community driven open specifications for the web.

If you'll be at SxSw and are a regular reader of my blog who would like to chat in person, feel free to swing by during one or both panels. I'd also be interested in what people who plan to attend either panel would like to get out of the experience. Let me know in the comments.

Note Now Playing: Estelle - American Boy (feat. Kanye West) Note

Categories: Trip Report

February 9, 2008

@ 04:00 AM

Lessons from the O'Reilly Social Graph FOO Camp

This past weekend I attended the O’Reilly Social Graph FOO Camp and got to meet a bunch of folks who I’ve only “known” via their blogs or news stories about them. My favorite moment was talking to Mark Zuckerberg about stuff I think is wrong with Facebook and he stops for a second while I’m telling hin the story of naked pictures in my Facebook news feed then says “Dare? I read your blog”. Besides that my favorite part of the experience was learning new things from folks with different perspectives and technical backgrounds from me. Whether it was hearing different perspectives on the social graph problem from folks like Joseph Smarr and Blaine Cook, getting schooled on the various real-world issues around using OpenID/OAuth in practice from John Panzer and Eran Hammer-Lahav or getting to ask getting to Q&A Brad Fitzpatrick about the Google Social Graph API, it was a great learning experience all around.

There have been some ideas tumbling around in my head all week and I wanted to wait a few days before blogging to make sure I’d let the ideas fully marinate. Below are a few of the more important ideas I took away from the conference.

Social Network Discovery vs. Social Graph Portability

One of the most startling realizations I made during the conference is a lot of my assumptions about why developers of social applications are interested in what has been mistakenly called “social graph portability” were incorrect. I had assumed a lot of social networking sites that utilize the password anti-pattern to screen scrape a user’s Hotmail/Y! Mail/Gmail/Facebook address book were doing that as a way to get a list of the user’s friends to ~~spam~~ invite to join the service. However a lot of the folks I met at the SG FOO Camp made me realize how much of a bad idea this would be if they actually did that. Sending out a lot of spam would lead to negativity being associated with their service and brand (Plaxo is still dealing with a lot of the bad karma they generated from their spammy days).

Instead the way social applications often use the contacts from a person’s email address book is to satisfy the scenario in Brad Fitzpatrick’s blog post URLs are People, Too where he wrote

So you've just built a totally sweet new social app and you can't wait for people to start using it, but there's a problem: when people join they don't have any friends on your site. They're lonely, and the experience isn't good because they can't use the app with people they know.

I then thought of my first time using Twitter and Facebook, and how I didn’t consider them of much use until I started interacting with people I already knew that used those services. More than once someone has told me, “I didn’t really get why people like Facebook until I got over a dozen friends on the site”.

So the issue isn’t really about “portability”. After all, my “social graph” of Hotmail or Gmail contacts isn’t very useful on Twitter if none of my friends use the service. Instead it is about “discovery”.

Why is this distinction important? Let’s go back to the complaint that Facebook doesn’t expose email addresses in it’s API. The site actually hides all contact information from their API which is understandable. However since email addresses are also the only global identifiers we can rely on for uniquely identifying users on the Web, they are useful as way of being able to figure out if Carnage4Life on Twitter is actually Dare Obasanjo on Facebook since you can just check if they are backed by the same email address.

I talked to both John Panzer and Brad Fitzpatrick about how we could bridge this gap and Brad pointed out something really obvious which he takes advantage of in the Google Social Graph API. We can just share email addresses using foaf:mbox_sha1sum (i.e. cryptographical one-way hashes of email addresses). That way we all have a shared globally unique identifier for a user but services don’t have to reveal their user’s email addresses.

I wonder how we can convince the folks working on the Facebook platform to consider adding this as one of the properties returned by Users.getInfo?

You Aren’t Really My Friend Even if Facebook Says We Are

In a post entitled A proposal: email to URL mapping Brad Fitzpatrick wrote

People have different identifiers, of different security, that they give out depending on how much they trust you. Examples might include:

Homepage URL (very public)
Email address (little bit more secret)
Mobile phone number (perhaps pretty secretive)

When I think back to Robert Scoble getting kicked off of Facebook for screen scraping his friends’s email addresses and dates of birth into Plaxo, I wonder how many of his Facebook friends are comfortable with their personal contact information including email addresses, cell phone numbers and home addresses being utilized by Robert in this manner. A lot of people argued at SG FOO Camp that “If you’ve already agreed to share your contact info with me, why should you care whether I write it down on paper or download it into some social networking site?”.

That’s an interesting question.

I realized that one of my answers is that I actually don’t even want to share this info with the majority of the people in my Facebook friends list in the first place [as Brad points out]. The problem is that Facebook makes this a somewhat binary decision. Either I’m your “friend” and you get all my private personal details or I’ve faceslammed you by ignoring your friend request or only giving you access to my Limited Profile. I once tried to friend Andrew ‘Boz’ Bosworth (a former Microsoft employee who works at Facebook) and he told me he doesn’t accept friend requests from people he didn’t know personally so he ignored the friend request. I thought it was fucking rude even though objectively I realize it makes sense since it would mean I could view all his personal wall posts as well as his contact info. Funny enough, I always thought that it was a flaw in the site’s design that we had to have such an awkward social interaction.

I think the underlying problem again points to Facebook’s poor handling of multiple social contexts. In the real world, I separate my interactions with co-workers from that with my close friends or my family. For an application that wants to be the operating system underlying my social interactions, Facebook doesn’t do a good job of handling this fundamental reality of adult life.

Now playing: D12 - Revelation

Categories: Social Software | Trip Report

July 24, 2007

@ 06:51 PM

A Broadband Connection in Every Pot: Vint Cerf on Google's Vision

Yesterday I got offered an opportunity to interview Vint Cerf just before he gave his talk entitled Tracking the Internet into the 21st Century (link is to video of the talk) at Google's Kirkland offices. I got to ask the questions I blogged about yesterday and also learned about some of Vint Cerf's interests in Nigeria. Below are the questions I asked and his paraphrased answers to my questions.

Q: Why did he decide to leave MCI, a company steeped in networking technology, to join Google, an advertising and search engine company, as Chief Internet Evangelist?

A: The job title was not his doing. Larry, Sergey and Eric told him they wanted him to continue his efforts in encouraging the growth of the Internet around the world and thought the title "Chief Internet Evangelist" best fit this position. There are 6.5 billion people on the planet today yet there are only 1 billion people on the Internet. Google would like to see the other 5 billion people on the Internet because the more people there are using the Internet, the better it is for Google. This is why the company needed a "Chief Internet Evangelist".

Vint Cerf spends a significant portion of his time encouraging implementations of the Internet. He travels all over the world meeting with senior government officials (presidents, ministers of information, etc) to recommend Internet friendly policies, discourage the rise of monopolistic or closed networks and encourage domestic/foreign investments in fledgling markets where Internet usage hasn't taken off. For example, he is working with some charitable entities to donate solar powered Internet cafes to businesses in Nigeria to encourage the usage of the Internet in remote or underprivileged parts of the country.

One aspect of the Internet's growth which he didn't pay much attention to at first but does now is Internet enabled mobile phones. It is estimated that there will be 3 billion people with mobile phones by the end of the year. That is 3 billion people who could all be connected to the Internet, if Internet connectivity became ubiqitous on mobile devices within a few years.

Looking back at the past few years, it is clear that adding more users to the Internet increases the quantity and diversity of information on the Web. This trend has been hastened by the rise of the consumer as producer. We now have people who would be traditionally considered to be consumers producing content on blogs, video sharing sites like YouTube, and creating social networks on sites like Orkut and Facebook. Another interesting trend is the rise of virtual worlds like World of Warcraft and Second Life. In these worlds users are creating interesting economic and sociological experiments with fascinating consequences (e.g. gold farming in China). In fact, some college professors are encouraging their students to join these sites to test out economic and sociological theories in ways that simply weren't feasible in the past. An interesting idea would be to see if we could create virtual objects which were associated with and could influence objects in the real world. For example, a virtual university where the electron microscopes and telescope actually displayed image data from electron microscopes and telescopes in the real world. Maybe as an optimization we could cache large amounts of the astronomical data so multiple instances of the virtual telescope could be used at once but only rarely would the physical telescope have to be used so there wasn't resource contention. Given that Google already has already started partnering with NASA to store and process large amounts of astronomical data this may be something that the company could be interested in trying out in the future.

Q: He has spoken out on Google's behalf in favor of net neutrality. However there seem to be many different definitions of Net Neutrality, some of which imply that having different tiers for Quality of Service is OK and some of which don't, which definition is Google in favor of and why?

A: Google didn't start the network neutrality debate, AT&T's CEO Ed Whitacre did when he claimed that companies like Google are getting a "free ride" on his network. This seems backwards to Vint Cerf since AT&T's customers pay broadband fees so they can access any site on the Internet. Expecting companies to pay AT&T's for access to its paying customers who are already paying for access to the Internet is old school "Telephone Think" that harkens back to the monopoly days of Ma Bell.

The philosophy of the Internet comes from a completely different roots. The philosophy was pretty much "Here are the specs, if you can figure out how to implement our protocols and can connect to our network then it's all good". This open philosophy is what enabled the growth of the Internet and eventually led to commercial entities [including telcos like AT&T] to become part of the network.

Vint Cerf and Google's definition of network neutrality has these five basic pillars

users should be able to reach any service connected to the network
users should be able to run any application and connect to the network (of course, this doesn't apply to applications that violate the law)
it is OK to charge for higher speed connections to the network.
operators should not discriminate against services a user is trying to access by varying the user's QoS or access charges when accessing that service.
Discrimination against a type of service (e.g. all video traffic has different QoS) is OK but singling out specific sites is not.

A number of ISPs already break these rules yet are not upfront with users that they are not getting a full Internet experience. Some claim that these rules limit the ability of ISPs to prevent denial of service attacks, fight spam and perform other activities that protect their networks. Google believes that such protections can still be enforced but should be done at the application layer and not by discriminating against packets. As for ISPs that believe this limits their ability to provide value added services [such as video sharing] the response is that competition should be based on providing innovative services instead of by artificially limiting the capabilities of your competitors because you control the network pipes.

Google wants the Internet to be an open environment which allows for innovation. They believe this is important to the Internet's growth.

Q: Google just pledged to spending up to $4.6 billion to license the 700MHz wireless spectrum in what the company has described as the most significant auction of wireless spectrum in history by the U.S. federal government. Why is this auction so significant and what kind of services can we expect from Google if it wins the auction?

A: [Editor's Note: Why this auction is significant is summarized quite well in David Stone's post Vint Cerf and the 700MHz Spectrum]
Google's primary goal is to increase the openness of Internet-connected networks around the world. This is why they've committed at least $4.6 billion to licensing the 700MHz wireless spectrum.

It isn't quite clear what business model Google will use the 700MHz spectrum for if they win the auction. Whatever they end up deciding, it will honor the four principles of open platforms they have espoused with regards to wireless networks. It is quite likely that leasing out this capacity is one of the business models Google will try out. However due to the propagation characteristics of the 700MHz band, it is likely that different business models will have to apply in rural versus urban environments.

Q: Net neutrality gets a lot of press, however there are other issues facing the Internet as well. What keeps him up at night besides net neutrality? Botnets? Government censorship of the Internet? Concerns that we’ll never upgrade from the current version of the Internet since it is already so entrenched around the world?

A: The rise of botnets, domain name security and the problems related to handling internationalized domain names (IDNs) are at the top of the list of problems facing the Internet that concern Vint Cerf. The IDN problem is particularly pernicious because not only did we have to figure out how to support non-ASCII characters in a system that was never designed to support them but once a way was found, the IDN homograph attack was born which promptly reverse most of the gains.

Switching to IPv6 is also an issue facing the Internet that we will have to deal with sooner than most people expect. Some have predicted that at the current rate of allocation by ICAAN we will run out of IPv4 addresses by 2011. At that point, it will start to look a lot more attractive to switch to IPv6. There may be workarounds such as people leasing some of the blocks they've been allocated to other parties but this leads to interesting problems for routers since the routing tables will be screwed up and will have to be tampered with to adjust to these shenanigans. Given that pretty much all the major operating systems (Vista, Mac OS X, *nix, etc) and networking equipment manufacturers (e.g. Juniper, Cisco) support IPv6, it's really up to the ISPs and they likely won't make any moves without customer demand. Unfortunately for them, things are liably to get ugly in the next five years or so and they may have to change their minds.

Categories: Competitors/Web Companies | Trip Report

June 26, 2007

@ 03:05 AM

Google Scalability Conference Trip Report: Scaling Google for Every User

These are my notes from the talk Scaling Google for Every User by Marissa Mayer.

Google search has lots of different users who vary in age, sex, location, education, expertise and a lot of other factors. After lots of research, it seems the only factor that really influences how different users view search relevance is their location.

One thing that does distinguish users is the difference between a novice search user and an expert user of search. Novice users typically type queries in natural language while expert users use keyword searches.

Example Novice and Expert Search User Queries

NOVICE QUERY: Why doesn't anyone carry an umbrella in Seattle?
EXPERT QUERY: weather seattle washington

NOVICE QUERY: can I hike in the seattle area?
EXPERT QUERY: hike seattle area

On average, it takes a new Google user 1 month to go from typing novice queries to being a search expert. This means that there is little payoff in optimizing the site to help novices since they become search experts in such a short time frame.

Design Philosophy

In general, when it comes to the PC user experience, the more features available the better the user experience. However when it comes to handheld devices the graph is a bell curve and there reaches a point where adding extra features makes the user experience worse. At Google, they believe their experience is more like the latter and tend to hide features on the main page and only show them when necessary (e.g. after the user has performed a search). This is in contrast to the portal strategy from the 1990s when sites would list their entire product line on the front page.

When tasked with taking over the user interface for Google search, Marissa Mayer fell back on her AI background and focused on applying mathematical reasoning to the problem. Like Amazon, they decided to use split A/B testing to test different changes they planned to make to the user interface to see which got the best reaction from their users. One example of the kind of experiments they've run is when the founders asked whether they should switch from displaying 10 search results by default because Yahoo! was displaying 20 results. They'd only picked 10 results arbitrarily because that's what Alta Vista did. They had some focus groups and the majority of users said they'd like to see more than 10 results per page. So they ran an experiment with 20, 25 and 30 results and were surprised at the outcome. After 6 weeks, 25% of the people who were getting 30 results used Google search less while 20% of the people getting 20 results used the site less. The initial suspicion was that people weren't having to click the "next" button as much because they were getting more results but further investigation showed that people rarely click that link anyway. Then the Google researchers realized that while it took 0.4 seconds on average to render 10 results it took 0.9 seconds on average to render 25 results. This seemingly imperciptible lag was still enough to sour the experience of users enough that they'd reduce their usage of the service.

Improving Google Search

There are a number of factors that determine whether a user will find a set of search results to be relevant which include the query, the actual user's individual tastes, the task at hand and the user's locale. Locale is especially important because a query such as "GM" is likely be a search for General Motors but a query such as "GM foods" is most likely seeking information about genetically modified foods. Given a large enough corpus of data, statistical inference can seem almost like artificial intelligence. Another example is that a search like b&b ab looks for bed and breakfasts in Alberta while ramstein ab locates the Ramstein Airforce Base. This is because in general b&b typically means bed and breakfast so a search like "b&b ab" it is assumed that the term after "b&b" is a place name based on statistical inference over millions of such queries.

At Google they want to get even better at knowing what you mean instead of just looking at what you say. Here are some examples of user queries which Google will transform to other queries based on statistical inference [in future versions of the search engine]

User Query	Google Will Also Try This Query
unchanged lyrics van halen	lyrics to unchained by van halen
how much does it cost for an exhaust system	cost exhaust system
overhead view of bellagio pool	bellagio pool pictures
distance from zurich switzerland to lake como italy	train milan italy zurich switzerland

Performing query inference in this manner is a very large scale, ill-defined problem. Other efforts Google is pursuing is cross language information retrieval. Specifically, if I perform a query in one language it will be translated to a foreign language and the results would then be translated to my language. This may not be particularly interesting for English speakers since most of the Web is in English but it will be valuable for other languages (e.g. an Arabic speaker interested in restaurant reviews from New York City restaurants).

Google Universal Search was a revamp of the core engine to show results other than text-based URLs and website summaries in the search results (e.g. search for nosferatu). There were a number of challenges in building this functionality such as

Google's search verticals such as books, blog, news, video, and image search got a lot less traffic than the main search engine and originally couldn't handle receiving the same level of traffic as the main page.
How do you rank results across different media to figure out the most relevant? How do you decide a video result is more relevant than an image or a webpage? This problem was tackled by Udi Manber's team.
How do you integrate results from other media into the existing search result page? Should results be segregated by type or should it be a list ordered by relevance independent of media type? The current design was finally decided upon by Marissa Mayer's team but they will continue to incrementally improve it and measure the user reactions.

At Google, the belief is that the next big revolution is a search engine that understands what you want because it knows you. This means personalization is the next big frontier. A couple of years ago, the tech media was full of reports that a bunch of Stanford students had figured out how to make Google five times faster. This was actually incorrect. The students had figured out how to make PageRank calculations faster which doesn't really affect the speed of obtaining search results since PageRank is calculated offline. However this was still interesting to Google and the students' company was purchased. It turns out that making PageRank faster means that they can now calculate multiple PageRanks in the time it used to take to calculate a single PageRank (e.g. country specific PageRank, personal PageRank for a given user, etc). The aforementioned Stanford students now work on Google's personalized search efforts.

Speaking of personalization, iGoogle has become their fastest growing product of all time. Allowing users create a personalized page then opening up the platform to developers such Caleb to build gadgets lets them learn more about their users. Caleb's collection of gadgets garner about 30 million daily page views on various personalized homepage.

Q&A

Q: Does the focus on expert searchers mean that they de-emphasis natural language processing?
A: Yes, in the main search engine. However they do focus on it for their voice search product and they do believe that it is unfortunate that users have to adapt to Google's keyword based search style.

Q: How do the observations that are data mined about users search habits get back into the core engine?
A: Most of it happens offline not automatically. Personalized search is an exception and this data is uploaded periodically into the main engine to improve the results specific to that user.

Q: How well is the new Universal Search interface doing?
A: As well as Google Search is since it is now the Google search interface.

Q: What is the primary metric they look at during A/B testing?
A: It depends on what aspect of the service is being tested.

Q: Has there been user resistance to new features?
A: Not really. Google employees are actually more resistant to changes in the search interface than their average user.

Q: Why did they switch to showing Google Finance before Yahoo! Finance when showing search results for a stock ticker?
A: Links used to be ordered by ComScore metrics but ince Google Finance shipped they decided to show their service first. This is now a standard policy for Google search results that contain links to other services.

Q: How do they tell if they have bad results?
A: They have a bunch of watchdog services that track uptime for various servers to make sure a bad one isn't causing problems. In addition, they have 10,000 human evaluators who are always manually checking teh relevance of various results.

Q: How do they deal with spam?
A: Lots of definitions for spam; bad queries, bad results and email spam. For keeping out bad results they do automated link analysis (e.g. examine excessive number of links to a URL from a single domain or set of domains) and they use multiple user agents to detect cloaking.

Q: What percent of the Web is crawled?
A: They try to crawl most of it except that which is behind signins and product databases. And for product databases they now have Google Base and encourage people to upload their data there so it is accessible to Google.

Q: When will I be able to search using input other than search (e.g. find this tune or find the face in this photograph)?
A: We are still a long way from this. In academia, we now have experiments that show 50%-60% accuracy but that's a far cry from being a viable end user product. Customers don't want a search engine that gives relevant results half the time.

Categories: Trip Report

June 26, 2007

@ 03:04 AM

Google Scalability Conference Trip Report: Lessons in Building Scalable Systems

These are my notes from the talk Lessons in Building Scalable Systems by Reza Behforooz.

The Google Talk team have produced multiple versions of their application. There is

a desktop IM client which speaks the Jabber/XMPP protocol.
a Web-based IM client that is integrated into GMail
a Web-based IM client that is integrated into Orkut
An IM widget which can be embedded in iGoogle or in any website that supports embedding Flash.

Google Talk Server Challenges

The team has had to deal with a significant set of challenges since the service launched including

Support displaying online presence and sending messages for millions of users. Peak traffic is in hundreds of thousands of queries per second with a daily average of billions of messages handled by the system.
routing and application logic has to be applied to each message according to the preferences of each user while keeping latency under 100ms.
handling surge of traffic from integration with Orkut and GMail.
ensuring in-order delivery of messages
needing an extensibile architecture which could support a variety of clients

Lessons

The most important lesson the Google Talk team learned is that you have to measure the right things. Questions like "how many active users do you have" and "how many IM messages does the system carry a day" may be good for evaluating marketshare but are not good questions from an engineering perspective if one is trying to get insight into how the system is performing.

Specifically, the biggest strain on the system actually turns out to be displaying presence information. The formula for determining how many presence notifications they send out is

total_number_of_connected_users * avg_buddy_list_size * avg_number_of_state_changes

Sometimes there are drastic jumps in these numbers. For example, integrating with Orkut increased the average buddy list size since people usually have more friends in a social networking service than they have IM buddies.

Other lessons learned were

Slowly Ramp Up High Traffic Partners: To see what real world usage patterns would look like when Google Talk was integrated with Orkut and GMail, both services added code to fetch online presence from the Google Talk servers to their pages that displayed a user's contacts without adding any UI integration. This way the feature could be tested under real load without users being aware that there were any problems if there were capacity problems. In addition, the feature was rolled out to small groups of users at first (around 1%).
Dynamic Repartitioning: In general, it is a good idea to divide user data across various servers (aka partitioning or sharding) to reduce bottlenecks and spread out the load. However, the infrastructure should support redistributing these partitions/shards without having to cause any downtime.
Add Abstractions that Hide System Complexity: Partner services such as Orkut and GMail don't know which data centers contain the Google Talk servers, how many servers are in the Google Talk cluster and are oblivious of when or how load balancing, repartitioning or failover occurs in the Google Talk service.
Understand Semantics of Low Level Libraries: Sometimes low level details can stick it to you. The Google Talk developers found out that using epoll worked better than the poll/select loop because they have lots of open TCP conections but only a relatively small number of them are active at any time.
Protect Against Operational Problems: Review logs and endeavor to smooth out spikes in activity graphs. Limit cascading problems by having logic to back off from using busy or sick servers.
Any Scalable System is a Distributed System: Apply the lessons from the fallacies of distributed computing. Add fault tolerance to all your components. Add profiling to live services and follow transactions as they flow through the system (preferably in a non-intrusive manner). Collect metrics from services for monitoring both for real time diagnosis and offline generation of reports.

Recommended Software Development Strategies

Compatibility is very important, so making sure deployed binaries are backwards and forward compatible is always a good idea. Giving developers access to live servers (ideally public beta servers not main production servers) will encourage them to test and try out ideas quickly. It also gives them a sense of empowerement. Developers end up making their systems easier to deploy, configure, monitor, debug and maintain when they have a better idea of the end to end process.

Building an experimentation platform which allows you to empirically test the results of various changes to the service is also recommended.

Categories: Platforms | Trip Report

June 26, 2007

@ 03:03 AM

Google Scalability Conference Trip Report: Using MapReduce on Large Geographic Datasets

These are my notes from the talk Using MapReduce on Large Geographic Datasets by Barry Brummit.

Most of this talk was a repetition of the material in the previous talk by Jeff Dean including reusing a lot of the same slides. My notes primarily contain material I felt was unique to this talk.

A common pattern across a lot of Google services is creating a lot of index files that point and loading them into memory to male lookups fast. This is also done by the Google Maps team which has to handle massive amounts of data (e.g. there are over a hundred million roads in North America).

Below are examples of the kinds of problems the Google Maps has used MapReduce to solve.

Locating all points that connect to a particular road
Input	Map	Shuffle	Reduce	Output
List of roads and intersections	Create pairs of connected points such as {road, intersection} or {road, road} pairs	Sort by key	Get list of pairs with the same key	A list of all the points that connect to a particular road

Rendering Map Tiles
Input	Map	Shuffle	Reduce	Output
Geographic Feature List	Emit each feature on a set of overlapping lat/long rectangles	Sort by Key	Emit tile using data for all enclosed features	Rendered tiles

Finding Nearest Gas Station to an Address within five miles
Input	Map	Shuffle	Reduce	Output
Graph describing node network with all gas stations marked	Search five mile radius of each gas station and mark distance to each node	Sort by key	For each node, emit path and gas station with the shortest distance	Graph marked with nearest gas station to each node

When issues are encountered in a MapReduce it is possible for developers to debug these issues by running their MapReduce applications locally on their desktops.

Developers who would like to harness the power of a several hundred to several thousand node cluster but do not work at Google can try

Hadoop which is an Open Source project that is Yahoo! sponsored attempt to duplicate Google's key infrastructure pieces.
Amazon's Elastic Compute Cloud (EC2) which allows developers to rent access to Amazon's computing resource

Recruiting Sales Pitch

[The conference was part recruiting event so some of the speakers ended their talks with a recruiting spiel. - Dare]

The Google infrastructure is the product of Google's engineering culture has the following ten characteristics

single source code repository for all Google code
Developers can checkin fixes for any Google product
You can build any Google product in three steps (get, configure, make)
Uniform coding standards across the company
Mandatory code reviews before checkins
Pervasive unit testing
Tests run nightly, emails sent to developers if any failures
Powerful tools that are shared company-wide
Rapid project cycles, developers change projects often, 20% time
Peer driven review process, flat management hierarchy

Q&A

Q: Where are intermediate results from map operations stored?
A: In BigTable or GFS

Q: Can you use MapReduce incrementally? For example, when new roads are built in North America do we have to run MapReduce over teh entire data set or can we only factor in the changed data?
A: Currently, you'll have to process the entire data stream again. However this is a problem that is the target of lots of active research at Google since it affects a lot of teams.

Categories: Platforms | Trip Report

June 25, 2007

@ 03:07 AM

Comments [8]

Google Scalability Conference Trip Report: MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets

These are my notes from the keynote session MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets by Jeff Dean.

The talk was about the three pillars of Google's data storage and processing platform; GFS, BigTable and MapReduce.

GFS

The developers at Google decided to build their own custom distributed file system because they felt that they had unique requirements. These requirements included

scalable to thousands of network nodes
massive read/write bandwidth requirements
ability to handle large blocks of data which are gigabytes in size.
need exremely efficient distribution of operations across nodes to reduce bottlenecks

One benefit the developers of GFS had was that since it was an in-house application they could control the environment, the client applications and the libraries a lot better than in the off-the-shelf case.

GFS Server Architecture

There are two server types in the GFS system.

Master servers

These keep the metadata on the various data files (in 64MB chunks) within the file system. Client applications talk to the master servers to perform metadata operations on files or to locate the actual chunk server that contains the actual bits on disk.

Chunk servers

These contain the actual bits on disk and can be considered to be dumb file servers. Each chunk is replicated across three different chunk servers to create redundancy in case of server crashes. Client applications retrieve data files directly from chunk servers once they've been directed to the chunk server which contains the chunk they want by a master server.

There are currently over 200 GFS clusters at Google, some of which have over 5000 machines. They now have pools of tens of thousands of machines retrieving data from GFS clusters that run as large as 5 petabytes of storage with read/write throughput of over 40 gigabytes/second across the cluster.

MapReduce

At Google they do a lot of processing of very large amounts of data. In the old days, developers would have to write their own code to partition the large data sets, checkpoint code and save intermediate results, handle failover in case of server crashes, and so on as well as actually writing the business logic for the actual data processing they wanted to do which could have been something straightforward like counting the occurence of words in various Web pages or grouping documents by content checksums. The decision was made to reduce the duplication of effort and complexity of performing data processing tasks by building a platform technology that everyone at Google could use which handled all the generic tasks of working on very large data sets. So MapReduce was born.

MapReduce is an application programming interface for processing very large data sets. Application developers feed in a key/value pair (e.g. {URL,HTML content} pair) then use the map function to extract relevant information from each record which should produce a set of intermediate key/value pairs (e.g. {word, 1 } pairs for each time a word is encountered) and finally the reduce function merges the intermediate values associated with the same key to produce the final output (e.g. {word, total count of occurences} pairs).

A developer only has to write their specific map and reduce operations for their data sets which could run as low as 25 - 50 lines of code while the MapReduce infrastructure deals with parallelizing the task and distributing it across different machines, handling machine failures and error conditions in the data, optimizations such as moving computation close to the data to reduce I/O bandwidth consumed, providing system monitoring and making the service scalable across hundreds to thousands of machines.

Currently, almost every major product at Google uses MapReduce in some way. There are 6000 MapReduce applications checked into the Google source tree with the hundreds of new applications that utilize it being written per month. To illustrate its ease of use, a graph of new MapReduce applications checked into the Google source tree over time shows that there is a spike every summer as interns show up and create a flood of new MapReduce applications that are then checked into the Google source tree.

MapReduce Server Architecture

There are three server types in the MapReduce system.

Master server

This assigns user tasks to map and reduce servers as well as keeps track of the state of these tasks.

Map Servers

Accepts user input and performs map operation on them then writes the results to intermediate files

Reduce Servers

Accepts intermediate files produced by map servers and performs reduce operation on them.

One of the main issues they have to deal with in the MapReduce system is problem of stragglers. Stragglers are servers that run slower than expected for one reason or the other. Sometimes stragglers may be due to hardware issues (e.g. bad harddrive conttroller causes reduced I/O throughput) or may just be from the server running too many complex jobs which utilize too much CPU. To counter the effects of stragglers, they now assign multiple servers the same jobs which counterintuitively ends making tasks finish quicker. Another clever optimization is that all data transferred between map and reduce servers is compressed since the servers usually aren't CPU bound so compression/decompression costs are a small price to pay for bandwidth and I/O savings.

BigTable

After the creation of GFS, the need for structured and semi-structured storage that went beyond opaque files became clear. Examples of situations that could benefit from this included

associating metadata with a URL such as when it was crawled, its PageRank™, contents, links to it, etc
associating data with a user such as the user's search history and preferences
geographical data such as information about roads and sattelite imagery

The system required would need to be able to scale to storing billions of URLs, hundreds of terabytes of satellite imagery, data associated preferences with hundreds of millions of users and more. It was immediately obvious that this wasn't a task for an off-the-shelf commercial database system due to the scale requirements and the fact that such a system would be prohibitively expensive even if it did exist. In addition, an off-the-shelf system would not be able to make optimizations based on the underlying GFS file system. Thus BigTable was born.

BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Instead it is more like a multi-level map data structure. It is a large scale, fault tolerant, self managing system with terabytes of memory and petabytes of storage space which can handle millions of reads/writes per second. BigTable is now used by over sixty Google products and projects as the platform for storing and retrieving structured data.

The BigTable data model is fairly straightforward, each data item is stored in a cell which can be accessed using its {row key, column key, timestamp}. The need for a timestamp came about because it was discovered that many Google services store and compare the same data over time (e.g. HTML content for a URL). The data for each row is stored in one or more tablets which are actually a sequence of 64KB blocks in a data format called SSTable.

BigTable Server Architecture

There are three primary server types of interest in the BigTable system.

Master servers

Assigns tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.

Tablet servers

Handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.

Lock servers

These are instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.

There are a number of optimizations which applications can take advantage of in BigTable. One example is the concept of locality groups. For example, some of the simple metadata associated with a particular URL which is typically accessed together (e.g. language, PageRank™ , etc) can be physically stored together by placing them in a locality group while other columns (e.g. content) are in a separate locality group. In addition, tablets are usually kept in memory until the machine is running out of memory before their data is written to GFS as an SSTable and a new in memory table is created. This process is called compaction. There are other types of compactions where in memory tables are merged with SSTables on disk to create an entirely new SSTable which is then stored in GFS.

Current Challenges Facing Google's Infrastructure

Although Google's infrastructure works well at the single cluster level, there are a number of areas with room for improvement including

support for geo-distributed clusters
single global namespace for all data since currently data is segregated by cluster
more and better automated migration of data and computation
lots of consistency issues when you couple wide area replication with network partitioning (e.g. keeping services up even if a cluster goes offline for maintenance or due to some sort of outage).

Recruiting Sales Pitch

[The conference was part recruiting event so some of the speakers ended their talks with a recruiting spiel - Dare]

Having access to lots of data and computing power is a geek playground. You can build cool, seemingly trivial apps on top of the data such which turn out to be really useful such as Google Trends and catching misspellings of "britney spears. Another example of the kinds of apps you can build when you have enough data treating the problem of language translation as a statistical modeling problem which turns out to be one of the most successful methods around.

Google hires smart people and lets them work in small teams of 3 to 5 people. They can get away with teams being that small because they have the benefit of an infrastructure that takes care of all the hard problems so devs can focus on building interesting, innovative apps.

Categories: Platforms | Trip Report

April 29, 2007

@ 12:13 PM

Music from my Trip to Nigeria

I really got into Nigerian hip hop and R&B music while I was there over the past few weeks. Below are links to my favorite songs from my trip, many of which are fairly old but were new to me.

Tongolo by D'Banj: A club banger done in a mix of pidgin English and Yoruba
Raise the Roof by Jazzman Olofin: Don't be fooled by the English title this song is mostly in Yoruba. The song is a general exhortation to dance which is a fairly popular topic for Yoruba hit music
Iya Basira by Styl-Plus: A humorous song about a guy who gets so hooked on food from Iya Basira's (i.e. Basira's Mom) restaurant that he thinks she is using jazz (i.e. magic, voodooo, juju, etc) to make the food taste so good.
Nfana Ibaga (No Problem) by 2Face Idibia: The opening rap is beyond wack but the song itself is quite good. He scored an international hit with a song called African Queen which I really didn't feel that much.
Imagine That by Styl-Plus: This is a fairly crappy video but I love the song. The chorus is a mix of Yoruba and English. Roughly translated it goes "Imagine That! She says she doesn't want us to do this anymore. Imagine That! After everything I've done for her. Imagine That! What does she expect to become of me if she goes. Imagine That! If she goes".

Categories: Music | Personal | Trip Report

December 6, 2006

@ 03:08 AM

Widgets Live and Diversity in Conferences

Linking to Niall Kennedy's blog reminded me that I owed him an email response to a question he asked about a month ago. The question asked what I thought about the diversity of speakers at the Widgets Live conference given my comments on the topic in my blog post entitled Who Attends 'Web 2.0' Conferences.

After thinking about it off and on for a month, I realize that I liked the conference primarily because of its content and focus. The speakers weren't the usual suspects you see at Web conferences nor were they homogenous in gender and ethnic background. I assume the latter is a consequence of the fact that the conference was about concrete technical topics as opposed to a gathering to gab with the hip Web 2.0 crowd which meant that the people who actually build stuff were there...and guess what they aren't all caucasian males in their 20s to 30s, regardless of how much conferences like The Future of Web Apps and Office 2.0 pretend otherwise.

This is one of the reasons I decided to pass on the Web 2.0 conference this year. It seems I may have made the right choice given John Battelle's comments on the fact that a bunch of the corporate VP types that spoke at the conference ended up losing their jobs the next week. ;)

Categories: Trip Report

November 7, 2006

@ 01:28 AM

Widgets Live Trip Report: PhotoBucket Success Story

These are my notes from the session on Success Story: PhotoBucket.

PhotoBucket is a video and image hosting site that sees 7 million photos and 30,000 videos uploaded daily. They serve over 3 billion pieces of media a day. The site has 15 million unique users in the U.S. (20 million worldwide) and has 80,000 new accounts created daily. There is now a staff of 55 people whose job it is to moderate content submissions to ensure they meet their guidelines.

The top sites their images used to be linked from used to be eBay and LiveJournal but now the key drivers of traffic are now social networking sites such as MySpace and Xanga. There is 30% - 40% overlap between their user base and social network website users

There was some general advise about widgets such as being careful about hosting costs which may pile up quick if your widgets become popular and also about trying to monetize users via your widgets because some sites frown upon that behavior such as eBay. However well designed and compelling widgets can drive a lot of traffic back to your site, the best example of this to date being YouTube.

The speaker then gave a timeline of notable occurences in the MySpace widgets world such as MySpace blocking Revver & YouTube to the recent explosion of new widgets in the past few months from MeeboMe to a number photo slideshow widgets from the major image hosting services.

Pete Cashmore over at Mashable.com has compiled some statistics on the most popular widgets on MySpace which shows the relative popularity of PhotoBucket's widgets in comparison to other services.

So what's in a name? They've renamed the feature from BucketFeatures to Widgets and now to 'Slide Shows' because none of their non-Silicon Valley users knew what widgets were. After the rename from 'Widgets' to 'Slide Shows', the usage of the feature almost doubled within a month.

They've also designed a JWidget which allows people to log-in to their PhotoBucket account to access their videos and images. Users can upload images and videos . This way people can outsource both image upload and content moderation to PhotoBucket. Now have 16,000,000 logins a month via JWidget from about 500 partner sites. It is amed JWidget because the developer's name begins with 'J'. :)

During the Q&A someone asked if they support you have tagging & open APIs. The response wa sthat they don't do tagging and their user base has never asked for tagging. With 2500 support tickets a day, none of them have ever been about tagging. Also, since it is just image hosting service, tagging is probably more appropriate for the blog post or profile the image is appearing in than on the hosted image. They don't have an No API primarily due to resource constraints, there are only 40 people at the company working on it.

Categories: Trip Report

November 7, 2006

@ 01:26 AM

Widgets Live Trip Report: Meebo Success Story

These are my notes from the session on Success Story: MeeboMe.

Meebo started as a way for the founders to stay in touch with each other when they were at places where they couldn't install their IM client of choice. They realized that Instant Messaging hadn't really met the potential of Web and decided to create a startup to bring IM to the Web. Today they have grown to a site with 1 million logins daily, 4 million unique users a month and 64 million messages sent a day.

MeeboMe is an embeddable IM windows you can drop on any webpage. People can see your online status. Even cooler is that it allows the Meebo user to see people who is viewing that page and then they can send an IM to the page in real time while they are viewing the page. That is fucking cool. I'm so blown away that I've decided to figure out a way to get MeeboMe on Windows Live Spaces and will start looking into how to get that to happen when I get back to work..

There are three main reasons they built the MeeboMe widget; It meets their core mission of bringing IM to the Web, it drives use of Meebo.com and their users asked for it. :)

Their design principles have been quite straightforward. They have used Flash and protocols like Jabber/XMPP that already exist and that they are familiar with to ease development. They try to keep features to a minimum and focus on making Meebo.com act like the traditional IM experience. They have had t deal with performance issues around sending/receiving messages and showing changes to a user's online presence without significant lag. They are also very driven by user feedback and the Meebo blog is embedded in the Meebo web experience when users sign in. User feedback is how they determined that being able to show emoticons in instant messages was more important to users than being able to add IM buddies from Meebo.

MeeboMe is used in a lot of places such as education by high school teachers and college professors as way to give students a way to contact them. Librarians have also used it as a way to have patrons contact the librarian about questions by placing the MeeboMe widget on the front page of the library's website. There is a radio DJ takes requests from the MeeboMe widget on his site. There are also retail sites that use MeeboMe for customer support. One trend they didn't expect is that people place different MeeboMe widgets on different pages on their site si they can have a different buddy list entry for each page.

During the Q&A someone asked if MeeboMe drove account creation on Meebo.com and the answer was "Yes". They had their largest number of new accounts up to that date when they launched the widget.

Categories: Trip Report

November 7, 2006

@ 01:06 AM

Widgets Live Trip Report: Dan Strauss on Fox Interactive Media

These are my notes from the session on Fox Interactive Media by Dan Strauss.

Fox Interactive Media (FIM) is the parent company of MySpace. Also owns MySpace, IGN, Fox.com, FoxSports.com, AskMen.com, Rotten Tomatoes and Gamespy. They have 120 million visitors across all the sites.

They are buying small dev teams like Sidereus and Newroo as well as big companies like MySpace & IGN. They created FIM Labs so that some of the small dev teams can coninue to be innovative. FIM Labs focuses on incubation of new technologies, product development and technology evangelization to FIM properties. The folks from Sidereus worked on the Spring Widgets platform. Announcing a new platform named Spring Widgets.

Why widgets? They have a goal of to cross-pollinating users across the various FIM properties and also create a platform that can tie their businesses together. Widgets have been gaining traction and seemed like the right vehicle for furthering their goals.

Sidereus had a desktop background and researched Konfabulator, Dashboard and Vista gadgets.They also looked at Web widgets specifically AJAX and Flash widgets being used by MySpace users. They want users to be able to add widgets for FIM websites to their MySpace profiles and their desktop. From the Sping platform site a user can find a widget then add it to my MySpace. No more cutting and pasting code, the experience is similar to Windows Live Gallery for MySpace. Users can also drag and drop widgets from the Web onto the desktop. Only the Windows desktop widgets are supported for now but Mac support is on the way.

The Spring Widgets platform is 100% flash. Adding a desktop widget requires installing the Spring widgets runtime in addition to having Flash installed. This runtime is less than 2MB. There is an SDK so widget developers get APIs that can tell if the user is onlne or offline, store some persistent state, tell certain UI conditions such as the widgets window size and more. There is also a Web simulation tool developers can test their widgets without having to upload them to a Website.

The talk was followed by a demo showing how easy it is to build a Spring widget using WYSIWYG Flash development tools. They also announced a partnership with FeedBurner.

There were several questions during the Q&A that resulted in an answer of "we're still figuring things out". It was clear that although the technology may be ready there are a number of policy questions that are still left to be answered such as whether there will be integration of the Spring Widgets site into the MySpace UI (similar to how Windows Live gallery is integrated into Windows Live Spaces or what the certification process will be for getting 3rd party widgets hosted on the Spring Widgets site?

Despite the open questions this is definitely a very bold move on the part of Fox Interactive Media. It does the question though that if every widget platform has its own certified widgets gallery that use their own platforms (e.g. Flash in the case of Spring Widgets, DHTML and XML in the case of Windows Live Gallery and proprietary markup in the case of Yahoo! Widgets Engine there is either going to have to be some standardization or else there may be a winner takes all where widget developers target one or two major widget platforms because they don't have the resources to support every homebrow Flash or AJAX platform out there.

Categories: Trip Report

November 6, 2006

@ 06:46 PM

Widgets Live Trip Report: Arlo Rose on Konfabulator

These are my notes from the session on Konfabulator by Arlo Rose.

He started with answering the question, why name them 'widgets'? At Apple, a UI control was called a widget. He thought the name meant something more and has always wanted to build widgets that do more.

He was the creator of Kaleidoscope which was one of the key customization and theming applications on the Macintosh. The application was so popular that the CEO of Nokia mentioned it as the inspiration for customization in cell phones.

When Apple announced Mac OS X, he became nervous that this would spell the end of Kaleidoscope and it was because they couldn't make the transition to Cocoa so they killed the product. Arlo then looked around for a new kind of application to build and came across the Control Strip and Control Strip Modules on the Mac which he thought were useful but had a bad user experience. He had also discovered an MP3 player for the Mac named Audion which used cool UI effects to create little UI components on the desktop which seemed transparent. Arlo thought it would be a great idea to build a better Control Strip using Audion-like UI. He talked to his partner from Kaleidoscope but he wasn't interested in the idea. He also talked to the developers of Audion but they weren't interested either. So arlo gave up on the idea and wandered from startup to startup until he ended up at Sun Microsystems

At Sun, he was assigned to a project related to the Cobalt Qube which eventually was cancelled. He then had time to work on a side project and so he resurected his idea for building a better Control Strip with an Audion-like user interface. He originally wanted to develop the project using Perl and XML as the development languages but he soon got some feedback that creative types on the Web are more familiar with Javascript. So in 2002 he started on Konfabulator and released version 1.0 the following year. They also created a widget gallery that enabled developers to upload widgets they've built to share with other users. However they didn't get a lot of submissions from developers so they talked to developers and got a lot of feedback on features to add to their platform such as drag and drop, mouse events, keyboard events and so on. Once they did that they started getting dozens and dozens of develper submissions.

After they got so much praise for the Mac version, they decided to work on a Windows version. While working on the Windows version, he got a call from a friend at Apple who said while he was at a design meeting and he heard "We need to steamroll Konfabulator". He started calling all his friends at Apple and eventually it turned out that the Apple product that was intended to steamroll Konfabulator was Dashboard. The products are different, Dashboard uses standard DHTML while Konfabulator uses proprietary markup. Arlo stated that their use of proprietary technologies gave them advantages over using straight DHTML.

Unfortunately, even though they got millions of downloads of the Windows version not a lot of people paid for the software. There were a number of reasons for this. The first was that in general there is a less of a culture of paying for shareware in the Windows world than in the Mac world. Secondly, there were free alternatives to their product on Windows that had sprung up while there was only a Mac version. In looking for revenue, they sought out partnerships and formed one with Yahoo!. He also talked to people at Microsoft in Redmond who let him know that they were planning to add gadgets to ~~Longhorn~~ Windows Vista. Microsoft made him an offer to come work on Windows Vista but he turned it down. Later on, he was pinged by a separate group at MSN that expressed an interest in buying Konfabulator. Once this happened, Arlo contacted Google and Yahoo! to see if theyd make counter offers and Yahoo! won the bidding war.

They started working on the Yahoo! Widget Engine and the goal was to make it a platform for accessing Yahoo! APIs as part of the Yahoo! Developer Network. However consumers still wanted a consumer product like Konfabulator and eventually they left the YDN and went to the Connected Life group at Yahoo! which works on non-Web consumer applications such as desktop and mobile applications.

There are now 4000 3rd party widgets in the Yahoo! widget gallery and they are the only major widget platform which is cross platform. Also they are the only widget platform that has total access to Yahoo! data.

Q & A

Q: What's next?
A: The next question is to see how far widgets can scale as mini-applications. Can a picture frame widget become something more but not a full replacement for Flickr or Photoshop?

Q: What do you think of the Apollo project from Adobe?
A: Doesn't know what it is.

Q: Did he ever figure out a business model for widgets?
A: He planned to make deals with companies like J.Crew, Staples, and Time Warner for movie tie-ins.

Q: Why move from YDN to Connected Life?
A: They were 3 people and they couldn't do both the developer side & the consumer application. Also it turned out that the Yahoo! Developer Network turned out not to have the clout that they thought they would in that Yahoo! applications would refuse to provide APIs that could be accessed by 3rd party developers but would create special APIs for writing Konfabulator widgets.

Categories: Trip Report

March 20, 2006

@ 04:28 PM

MIX '06 Sessions You Should Attend

Although I won't be attending MIX '06 this week, I am interested in what people think of some of the announcements that will be made around building Web platforms with Microsoft technologies and various Windows Live announcements this week. I am also surprised to see that a bunch of talks are from external folks from companies like Yahoo! and Amazon which is nice.

Some of the sessions I'd attend if I was in Las Vegas would be

NGW020 - Lessons from the Trenches: Engineering Great AJAX Experiences

Speaker(s): Scott Isaacs
Explore the challenges and lessons learned developing the Windows Live and Gadgets Web client frameworks powering Windows Live, Hotmail (Kahuna beta), Spaces, and more. This technical talk presents design and architectural considerations for building interactive AJAX-like sites. See how componentization, network management, accessibility, page composition, and more impact the design and engineering of your Web application.

NGW050 - The Windows Live Platform: Build Applications That Have Access to 400 Million Address Books, and 13 Billion Contacts!

Speaker(s): Brian Arbogast, Ken Levy
Windows Live provides unique opportunities for developers from hobbyists to large ISVs to build social networking applications on top of the largest contact and address book database on the Internet. Developers can build these applications utilizing Windows Live services such as instant messaging, search, location based mapping, blogging, gadgets, and others. These can be AJAX style Web applications, run within Messenger or other rich applications. The Windows Live Platform provides several business models based on revenue sharing and paid placement. Brian Arbogast, Vice President of the Windows Live Platform, discusses and demonstrates the latest developments of the Windows Live platform.

BTB021 - From HTML to Services: Building a Site for REST, POX, RSS, SOAP, and WS-*

Speaker(s): Doug Purdy, Clemens Vasters
Your site is more than a collection of pages; it's a programmable platform that your users are leveraging in innovative new ways. Scraping, mashups, and RSS mean that your site is already a service, and the fastest, most flexible way to build that service is with Windows Communication Foundation (WCF). With WCF you can expose your site over a whole host of different transports and formats, ensuring that clients of all kinds can access your content. Use WCF to take your site to the next level and provide an optimized experience for all of your users.

DIS004 - Beyond the Banner: Advertising on the Web and Where It’s Going

Speaker(s): Ron Belanger (Yahoo!), Bant Breen (Interpublic Group), David Jakubowski, Jeff Lanctot (Avenue A), Jed Nahum, Jason Rapp (New York Times), Jennifer Slegg (JenStar)
Advertising revenues in traditional venues like print magazines and television are declining, while online advertising is exploding. Advertisers want richer online ad platforms. Content providers want ads that maximize revenue without negatively impacting the user experience. What role will advertising play in "podcasting," "video blogging" and other emerging media? Join a panel of industry luminaries to discuss these and other issues.

NGW044 - Build Your Own Search Engine

Speaker(s): Jeff Barr (Amazon)
Amazon subsidiary Alexa.com is leveling the search playing field. For the first time, developers looking to build the next "big thing" in search or an ultra custom search engine have access to the 300 terabytes of Alexa crawl data, along with the utilities to search, process, and publish their own custom subset of the data-all at a reasonable price. Developers no longer need a million dollar budget or to reinvent the wheel designing search algorithms, to be able to build their own search engines or create customized Web services based on data from the Alexa crawl. As a full-service Web analysis and Web service publication platform, the Alexa Web Search Platform should allow any user with an Internet connection to access Web content on a large scale and provide new services or applications to the online community. Jeff Barr provides an overview of the Alexa API and shows developers and designers how to get on the new, leveled search playing field.

BTB007 - Building AJAX Applications Using Yahoo! Web Services

Speaker(s): Jeffrey McManus (Yahoo!)
Yahoo! is opening up to developers using Web services. Today our services enable developers to access Yahoo! properties as diverse as Web search, maps, Flickr, comparison shopping, and many more -- and we're making more available all the time. In this session you'll learn how you can incorporate Yahoo! Web services in your application or Web site, and see a demonstration of integration between Yahoo! Web Services, the new Yahoo! Presentation Library, and ASP.NET "Atlas".

LNGW603 - Building Live.com Gadgets

Focus(s): Architect, Developer
Session Type(s): Hands-On Lab

In this lab, you'll create a gadget that mashes up concert event info from podbop.org with images from flickr.com

BTB013 - Building Location-Centric Services with Virtual Earth

Speaker(s): Alex Daley
As content and services are delivered to customers in more locations, on more devices, users are expecting information to be more tailored to their context and more relevant not to just "what" they are looking for, but "where". Microsoft's Alex Daley hosts this session exploring ways to reach customers with location relevant information, building richer experiences that make your site or application more "sticky" and fulfilling for users. Alex covers using Microsoft's Virtual Earth development technologies to build applications from store locators to location-based social networking to customized vertical search experiences.

BTB025 - Developing Interactive Applications Using Windows Live Robots, Activities, and Alerts

Speaker(s): Pierre Berkaloff (Conversagent), Campbell Gunn, John Kim (Conversagent)
Learn how to create rich and deeply integrated applications leveraging the 200 million worldwide Windows Live Messenger users. Windows Live Messenger offers a unique platform for building applications that provide a shared experience, such as joint shopping, multi-user gaming, customer support, and more. Windows Live Messenger applications can use a combination of features including BOTS, Alerts, and Activities (which is the application window within Messenger), as well as social networking. Learn about the business opportunities exposed by the Windows Live Messenger platform and details on how to build applications that capitalize on these opportunities.

NGW052 - Designing a Better User Experience with AJAX and “Atlas”

Speaker(s): Brad Abrams, Rick Spencer
This session explores best practices for designers and developers who are tackling the real challenges of building AJAX-style user experiences on the Web. Explore a few key principles that are the hallmark of the modern web from developers on the Microsoft Live and "Atlas" teams. These principles will be illustrated with real-world examples from the Windows Live Local and you will learn how they are easy to leverage with Atlas. Discover how the "Atlas" controls and components remove the complexity from designing rich, interactive experiences, and help you build AJAX-style applications more quickly.

NGW034 - From “Username and Password” to "InfoCard"

Speaker(s): Garrett Serack
"InfoCard" can bring a new level of security to authenticating users to your site. In this session, take a deep developer look at how this can be achieved. A traditional forms-based authentication implementation is converted to use InfoCard, along with explanations of the Web services, protocols, and security considerations that one needs to understand.

NGW004 - Introducing Microsoft adCenter – the Next Generation Advertising Platform

Speaker(s): David Jakubowski, Jed Nahum
adCenter is the next generation of online advertising that will allow you to conveniently plan, execute, and adjust your online advertising programs. Get the insider view of our current search advertising pilot in the U.S., and a preview of the innovations we're testing at the Microsoft adLabs.

Definitely a diverse set of talks. Check them out and let me know what you learned.

Categories: Trip Report | Windows Live

March 13, 2006

@ 06:14 PM

ETech 2006 Trip Report: G/localization: When Global Information and Local Interaction Collide

These are my notes from the session G/localization: When Global Information and Local Interaction Collide by danah boyd

danah boyd began by pointing out what she means by G/localization. It is the ugliness that ensues when you bring the global and the local together. Today online spaces enable us to cross space and time. We can communicate with faraway peoples with the blink of an eye but in truth most of us do not live our lives in a multicultural environment which can cause problems when we build or participate in online communities. A culture is the artifacts, norms and values of a people. It isn't necessarily limited to nation-states, languages or ethnic groups. a company can have a 'corporate culture' and lots of attendees of ETech would probably identify themselves as being part of the 'geek culture'. In addition people tend to exist in multiple cultural frames simultaneously but don't usually notice until they are extracted from their normal routine (e.g. going on vacation).

There was once an assumption that mass media would lead to cultural homogenization. Although this is true in some respects, it has also led to some subcultures forming that a direct reaction to the mass culture such as the raver and goth subcultures among adolescents. Similar subcultures occur in online forums dating as far back as USENET, where newsgroups like rec.motorcyles were very different from others like rec.golf. In that era, social software tended to come in two distinct flavors. There was homogenous software that handled the communication needs of single groups such as mailing list software and then specialized built to handle the needs of a particular community such as Well.com.

Craig's List, Flickr and MySpace are examples of a new generation of successful social software. All three services have the following basic characteristics

Passionate designers and users: The creators of the services are passionate about the service they've created and use it themselves. All three services were seeded by friends and family of the founder(s) who became the foundation of a strong base of passionate users.
Public Personalities: Tom (MySpace), Stewart (Flickr) and Craig (Craig's List) put a human face on the service by directly interacting with users either in support roles or to give updates on the status of the service.
Integrated feedback loop: Changes to the sites are driven by customer demand which is often given directly to the people building the products

In anthropology there is a notion of 'embedded observation' where the researcher lives with the society being studied so as to learn from within the community instead of from outside. The designers of all three services seem to live by this principle when it comes to the cultures they've fostered. They things they do well are how they tend to watch, listen and learn directly from users instead of keeping user research as something outside the core design and development process done mainly for marketing purposes as is the case with many services. These services actually focus on 'real users' as opposed to personas or other caricatures of their user base. Another thing the above mentioned services do well is that they tend to nudge the culture instead of trying to control it. The Fakester saga on Friendster is an example of where the designers of a service tried to control the burgeoning culture of the service instead of flowing with it. Great social software services support and engage the community that has grown around their service instead of trying to control them.

It isn't all plain sailing, there are some key problems that face sites such as Craig's List, Flickr and MySpace including

Creator burnout: Being passionate about the product you work on often leads to overworking which eventually leads to burning out. Once this happens, it is hard for creators to maintain cultural embeddedness which leads to disconnects between the designers and users of the services.
Scaling: As a user base becomes more diverse it is often hard to deal with the increased cultural or even linguistic diversity of a service. An example is Orkut which became very popular amongst Brazilian users but none of the people working on the product understood their language. Secondly, as services become largeer they become harder to police which can eventually have significant consequences. Both Craig's List and MySpace are facing lawsuits because people feel they haven't effectively policed their services.

danah boyd then gave some guidelines for creators of social software that want to design through embeddedness

Passion is everything
Have safegaurds put in place to prevent burnout
Diversify your staff
Do not overdesign
Enable and empower your users. Don't attempt to control them, sometimes they might go in a different direction from what you intended and that's OK.
Integrate the development, design and customer support teams so they all know each other's pain.
Stay engaged with the community
Document the evolution of your community especially what aspects of the culture have driven feature design

The next topic was why people join online communities. The fact is most people like hanging out with people who are like them such as people who live in the same region, are of the same ethnic group or just share the same interests. Most people like to meet "new" people but not "different" people from them. There is also something about seemingly accidental or coincidental meetings that many people like. For example, two people can see each other on the bus every day for years and never talk but once they meet somewhere else they can spark up conversation about their shared identity (i.e. riders of a particular bus route). danah described this concept as the notion of familiar strangers.

danah boyd then showed some examples of the kinda of speech used on services like MySpace which is similar to L33t5p34k. She asserted that the creation of such dialects by teenagers are an attempt to assert their independence and at the same time obfuscate their speech from grown ups. In addition, she challenged the notion that machine translation would be ever be able to bridge languages due to cultural notions embedded in these languages. Simply translating teenage online speech to regular English in a mechanical manner loses some of the meanings of the words that are only understood by members of that community. One example she gave is the word ~~nigger~~ nigga. Depending on the culture of the speakers it could mean an affectionate term between males ("That's my nigga") to one which is intensely negative ("I can't believe Kimberly is a nigger lover"). Machine translation can't figure out the difference. Another real-world example which affects online communities is defining obscenity and pornography. Even the U.S. supreme court has given up on being able to properly define obscenity and pornography by saying it depends on the standards of the community. However when the community becomes anyone in the world with an Internet connection things become tricky. In the United States it's obscene to show women's nipples in public, on the other hand in Brazil you often find bare breasted women in national magazines while in United Arab Emirates a bare belly button is considered obscene. A picture considered tame in one country could be considered raunchy and obscene in others. danah talked about a conversation she wants saw on Flickr where women from the UAE were commenting on some photos of American women in tank tops and hot pants where they expressed sorrow that women in the U.S. need to objectify themselves sexually to be accepted by mainstream society. People like to argue about morality when it comes to building online services but the question is "Who's morality, yours or theirs?" This question becomes important to answer because it can lead to serious ramifications from lawsuits to your website being blocked in various countries.

In conclusion, danah boyd gave the following summary of what to do to design for G/localization

Empower users to personalize their experience
Enable users to control access to their online expressions of their personality by being able to make things private, public, etc.
Let users control opportunities for meeting people like them

I loved this talk. This was the only talk I attended where the Q&A session went on for ten minutes past when the talk was supposed to end and no one seemed ready to leave. danah boyd r0cks.

Categories: Trip Report

March 13, 2006

@ 06:08 PM

ETech 2006 Trip Report: Feeds as a Platform: More Data, Less Work

These are my notes from the session Feeds as a Platform: More Data, Less Work by Niall Kennedy, Brent Simmons and Jane Kim.

Niall Kennedy started of by talking about implementations of subscriptions and syndication feeds from the early days of the Web such as PointCast, Netscape NetCenter's Inbox Direct feature and Backweb. That was back in 1997. Today in 2006, there are dozens of applications for subscribing and consuming syndication feeds. Niall then broke feed readers up into four main categories; desktop, online, media center and mobile. Desktop RSS readers are usually text-centric and follow the 3-pane mail/news reader model. Consuming rich media from feeds (i.e. podcasts) is often not handed off to other applications instead of being an integrated part of the experience. Finally, a problem with desktop feed readers is that one cannot access their feeds from anywhere and at anytime. Online feed readers offer a number of advantages over desktop feed readers. An online reader is accessible from any computer so there is no problem of having to keep ones feeds in sync across multiple machines. Feeds are polled without the user needing to have an application constantly running as is the case with desktop readers. And the feed search capabilities are often more powerful since they can search feeds the user isn't subscribed to. Feed readers that are built into Media Centers enable people to consume niche content that may not be available from mainstream sources. Media Center feed readers can take advantage of the users playlists, recommendations/preferences and the time-shifting capabilities of the Media Center to provide a consistent and rich experience whether consuming niche content from the Web or mainstream content from regular television. Mobile readers tend to enable scenarios where the user needs to consume content quickly since users either ewant highly targetted information or just a quick distraction when consuming mobile content. Niall then gave examples of various kinds of feed readers such as iTunes which does audio and video podcasts, FireAnt which does video podcasts, and Screen3 which is a feed reader for mobile phones.

The talk then focused on some of the issues facing the ecosystem of feed providers and readers

too many readers hitting feed providers
payloads to feeds getting bigger due to enclosures
users suffering from information overload due to too many feeds
subscription and discovery are complicated
multiple feed formats
XML is often invalid - Google Reader team blogged that 15% of feeds ae invalid
roaming subscriptions between PCs when using a desktop reader

There are also developer specific issues such as handling multiple formats, namespaced extension elements, searching feeds, synchronization and ensuring one follows HTTP best practices that benefit when there exists a platform that already does the heavy lifting. Companies such as Google, NewsGator and Microsoft provide different platforms for processing feeds which can be used to either enhance existing feed readers or build new ones.

Jane Kim then took over the session to talk about the Windows RSS platform and how it is used by Internet Explorer 7. She began by explaining that one of the complelling things about Web feeds is that the syndicated information doesn't have to be limited to text. Thanks to the power of enclosures one can subscribe to calendars, contact lists, video and audio podcasts. For this reason the Internet Explorer team believes that consuming feeds goes beyond the browser which is one of the reasons they decided to build the Windows RSS platform. The integration of feed consumption in Internet Explorer is primarily geared at enabling novice users to easily discover, subscribe and read content from feeds. The Windows RSS platform consists of 3 core pieces that can be used by developers

Common Feed List - list of feeds the user has subscribed to
Download Engine - this manages downloading of feeds and enclosures
Centralized Feed Store - this is where the downloaded feeds and enclosures are stored on disk. All feed formats are normalized to a custom amalgam of RSS 2.0 and Atom 1.0.

By offering these centralized services it is hoped that this will lead to more sharing between feed reading applications instead of the current practice where information about a user's subscriptions are siloed within each feed reading application they use.

Jane then completed here talk by showing some demos. She showed subscribing to a Yahoo! News feed in Internet Explorer 7. She also showed some integration with the Windows RSS platform that is forthcoming in the next version of Feed Demon and the desktop gadgets in Windows Vista.

Niall Kennedy followed up by talking about the currently undocumented Google Reader API. The Google Reader team built an API which allows them to build multiple user interfaces for viewing a reader's subscription. The current view called 'lens' is just one of many views they have in development. One side effect of having this API is that third party developers could also build desktop or web-based feed readers on top of the Google Reader API. The API normalizes all feeds to Atom 1.0 [which causes some data loss] and provides mechanisms for tagging feeds, flagging items, marking items read/unread, searching the user's feeds and ranking/rating of items.

Brent Simmons took over at this point to talk about the NewsGator API . Brent was standing in for Greg Reinacker who couldn't make it to ETech but will be at MIX '06 to talk about some of the new things they are working on. Soon after Brent started working on NetNewsWire he began to get demands from users for building a solution that enabled them use NetNewsWire from multiple computers but keep the information in sync. He came up with a solution which involved being able to upload/download the state of a NetNewsWire instance to an FTP server which could then be used to synchronize another instance of NewNewsWire. This is the same solution I originally settled on for RSS Bandit.

After a while, Brent's customers started making more demands for synchronizing between feed readers on other platforms such as Windows or mobile phones and he realized that his solution couldn't meet those needs. He looked around at some options before settling on the NewsGator API. The Bloglines Sync API didn't work for synchronization since it is mainly a read-only API for fetching unread items from Bloglines. However it is complicated by the fact that fetching unread items from Bloglines marks them as read even if you don't read them in retrieving applications. So basically it sucks a lot. He also looked at the Apple .Mac Sync but that is limited to only the Mac platform.

The NewsGator API met all of Brent's needs by

uses standard protocols (SOAP) which can be consumed from any platform
using incremental synchronization [only downloading changes instead of entire application state] which improves bandwidth utilization
supports granular operations [add feed, mark item read, rename feed, etc]

Brent finished his talk by giving a demo of syncing NetNewsWire with the NewsGator Online Edition.

Categories: Trip Report

March 13, 2006

@ 05:58 PM

ETech 2006 Trip Report: Search and the Network Effect

These are my notes from the session Search and the Network Effect by Christopher Payne and Frederick Savoye.

This session was to announce the debut of Windows Live Search, upgrades to the features of live.com, and the newly christened Windows Live toolbar (formerly MSN Search toolbar) which now comes with Onfolio.

The user interface of the live.com personalized portal has undergone an overhaul, a number of gadgets such as the weather and stock quotes gadgets now look a lot snazzier. To improve the RSS reading experience there is now the ability to expand the content of a news headline simply by hovering over it and then drill down into the content if necessary. In addition, the user interface for adding gadgets to one's page has been improved and is now more intuitive. Finally, a new feature is that one can now build multiple 'pages' to organize one's gadgets and feeds of interest. I like the idea of multiple pages. I'll probably end up with three on my start page; Gadgets, News, and Blogs. It'll definitely improve the cluttered look of my current start page.

Windows Live Search is the search experience you get when you do a search on live.com. When you do a web search, you no longer get a page of N results with a series of next links to get more results. Instead you get a stream of results and a smart scroll bar which you can use to scroll up or down to view the results. So if you do a search and want to view the 105th result, instead of clicking next until you get to the page showing results 101 to 150, you just scroll down. However as noted in some comments on Slashdot this may cause some usability problems. For one, I can no longer bookmark or remember that my search result was on the third page of search results. Secondly, the fact that the scroll bar isn't relative (i.e. if there are 2,000,000 search results, moving the scrollbar halfway down doesn't jump you to the 1,000,000th result) is counter to how people expect scroll bars to behave. Another innovation in Windows Live Search is the slider that is used to show more results in Web and Image search. In image search, the slider can be used to increase the number of search results which then resizes the thumbnails on-the fly as more or fewer results are shown. This was quite impressive. There is also a 'Feed' search tab which can be used to search within RSS feeds which can then be added to one's live.com page.

However the most interesting new feature of Windows Live search are 'search macros'. A search macro is a shortcut for a complex search result which can then be added as a tab on the search page. For example, I can customize the search tab to contain the defaults Web, Image, Local and Feed search as well as a dare.define search. The dare.define search would expand out to (site:wikipedia.org | site:answers.com | site:webopedia.com) and I'd use it when I was searching for definitions. Users can create their own search macros and share them with others. Brady Forrest of the Windows Live search team has a lready created a few such as brady.gawkermedia which can be used to search all Gawker media sites. Search macros basically allow people to build their own vertical search engines on top of the Windows Live search experience and it is accessible to regular users. There are already dozens of macros on the Microsoft Gadgets website. In his demo, Christopher Payne showed the difference in search results when one searches for information about arctic oil drilling using the limiting one's search using the conservative site search macro vs. the liberal site search macro.

To perform a search using a macro just type "macro:[macroname] [query]" in the Windows Live Search text box, for example "macro:brady.seattle queen anne" searches a number of websites about Seattle for information about Queen Anne. There are a number of interesting operators one can use to build a macro besides site such as linkdomain, prefer and contains to name a few.

The Windows Live toolbar has learned a few new tricks since it was called the MSN Toolbar. For one, it now integrates with Windows Live Favorites so you can access your browser favorites from any machine or the Web. Another new feature is the addition of an anti-phishing feature which warns users when they visit a suspect web site. However the most significant addition is the inclusion of Onfolio, an RSS reader which plugs into the browser.

Categories: Trip Report

March 9, 2006

@ 10:00 AM

ETech 2006 Trip Report: eBay Web Services: A Marketplace Platform for Fun and Profit

These are my notes from the session eBay Web Services: A Marketplace Platform for Fun and Profit by Adam Trachtenberg.

This session was about the eBay developer program. The talk started by going over the business models for 'Web 2.0' startups. Adam Trachtenberg surmised that so far only two viable models have shown up (i) get bought by Yahoo! and (ii) put a lot of Google AdSense ads on your site. The purpose of the talk was to introduce a third option, making money by integrating with eBay's APIs.

Adam Trachtenberg went on to talk about the differences between providing information and providing services. Information is read-only while services are read/write. Services have value because they encourage an 'architecture of participation'.

eBay is a global, online marketplace that facilitates the exchange of goods. The site started off as being a place to purchase used collectibles but now has grown to encompass old and new items, auctions and fixed price sales (fixed price sales are now a third of their sales) and even sales of used cars. There are currently 78 million items being listed at any given time on eBay.

As eBay has grown more popular they have come to realize that one size doesn't fit all when it comes to the website. It has to be customized to support different languages and markets as well as running on devices other the PC. Additionally, they discovered that some companies had started screen scraping their site to give an optimized user experience for some power users. Given how fragile screen scraping is the eBay team decided to provide a SOAP API that would be more stable and performant for them than having people screen scrape the website.

The API has grown to over 100 methods and about 43% of the items on the website are added via the SOAP API. The API enables one to build user experiences for eBay outside the web browser such as integration with cell phones, Microsoft Office, gadgets & widgets, etc. The API has an affiliate program so developers can make money for purchases that happen through the API. An example of the kind of mashup one can build to make money from the eBay API is https://www.dudewheresmyusedcar.com. Another example of a mashup that can be used to make money using the eBay API is http://www.ctxbay.com which provides contextual eBay ads for web publishers.

The aforementioned sites are just a few examples of the kinds of mashups that can be built with the eBay API. Since the API enables buying and listing of items for sale as well as obtaining inventory data from the service, one can build a very diverse set of applications.

Categories: Trip Report

March 8, 2006

@ 06:59 PM

Comments [5]

ETech 2006 Trip Report: Building a Participation Platform: Yahoo! Web Services Past, Present, and Future

These are my notes from the session Building a Participation Platform: Yahoo! Web Services Past, Present, and Future by Jeffrey McManus

This was a talk about the Yahoo! Developer Network. Over the past year, Yahoo's efforts to harness the creativity of the developer community has lead to the creation of healthy developer ecosystem with tens of thousands of developers in it. They've built their ecosystem by providing web APIs, technical support for developers and diseminating information to the developer community via http://developer.yahoo.com. Over the past year they have released a wide variety of APIs for search, travel and mapping (AJAX, Flash and REST-based). They have also provided language specific support for JavaScript and PHP developers by offering custom libraries (JavaScript APIs for cross-browser AJAX, drag & drop, eventing and more) as well as output formats other than XML for their services (JSON and serialized PHP). They plan to provide specific support for other languages including Flash, VB.NET and C#.

The Yahoo! APIs are available for both commercial and non-commercial use. Jeffrey McManus then showed demos of various Yahoo! Maps applications from hobbyist developers and businesses.

Providing APIs to their services fits in with Yahoo!'s plan to enable users to Find, Use, Share and Expand all knowledge. Their APIs will form the basis of a 'participation platform' by allowing users to interact with Yahoo!'s services on their own terms. They then announced a number of new API offerings

Browser-based authentication: This is a mechanism to allow mashups to authenticate a Yahoo! user then call APIs on the user's behalf without having the mashup author store the username and password. Whenever the mashup wants to authenticate the user, they redirect the user to a Yahoo! login page and once the user signs in they are redirected back to the mashup page with a token in the HTTP header that the mashup can use for authentication when making API calls. This is pretty much how Microsoft Passport works. I pointed this out to Jeffrey McManus but he disagreed, I assume this is because he didn't realize the technical details of Passort authentication. . The application is given permission to act on behalf of the user for two weeks at a time after which the user has to sign-in again. The user can also choose to withdraw permission from an application as well.
Yahoo! Shopping API v2.0: This API will allow people to make narrow searches such as "Find X in size 9 men's shoes". Currently the API doesn't let you get as granular as Shoes->Men's Shoes->Size 9. There will also be an affiliate program for the Yahoo! Shopping API so people who drive purchases via the API can get money for it.
My Web API: This is an API for the Yahoo!'s bookmarking service called MyWeb.
Yahoo! Photos API: This will be a read/write API for the world's most popular photo sharing site.
Yahoo! Calendar API: A read/write API for interacting with a user's calendar

Most of the announced APIs will be released shortly and will be dependent on the browser-based authentication mechanism. This means they will not be able to be called by applications that aren't Web-based.

In addition, they announced http://gallery.yahoo.com which aims to be a unified gallery to showcase applications built with Yahoo! APIs but focused at end users instead of developers.

.Jeffrey McManus then went on to note that APIs are important to Yahoo! and may explain why a lot of the startups they've bought recently such as del.icio.us, blo.gs, Flickr, Dialpad, Upcoming and Konfabulator all have APIs.

As usual, I'm impressed by Yahoo!

Categories: Trip Report

March 8, 2006

@ 05:56 PM

ETech 2006 Trip Report: The Musical Myware

These are my notes from the session Musical Myware by Felix Miller

This was a presentation about Last.fm which is a social music site. The value proposition of Last.fm is that it uses people's attention data (their musical interests) to make their use of the product better.

The first two questions people tend to ask about the Last.fm model are

Why would people spy on themselves?
Why would they give up their attention data to a company?

Last.fm gets around of having to answer these questions by not explicitly asking users for their attention data (i.e. their musical interests). Instead, all the music they listen to on the site is recorded and used to build up a music profile for the user. Only songs the user listens to in their entirety are considered valid submissions so as not to count songs the user skips through as something they like. The service currently gets about 8 million submissions a day and got over 1 billion submissions last year. One problem with submissions is that a lot of their songs have bad metadata, he showed examples of several misspellings of Britney Spears which exist in their song catalog today. For this reason, they only use the metadata from 8 million out of their 25 million songs for their recommendation engine.

The recommendation engine encourages users to explore new artists they would like as well as find other users with similar tastes. The site also has social networking features but they were not discussed in detail since that was not the focus of the talk. However the social networking features do show users one of the benefits of building up a music profile (i.e. hence giving up their attention data) since they can find new people with similar tastes. Another feature of the site is that since they have popularity rankings of artists and individual songs, they can recommend songs by obscurity or popularity. Appealing to the music snob in users by recommending obscure songs to them has been a cool feature.

The site does allow people to delete their music profile and extract it as an XML file as well.

Categories: Trip Report

March 8, 2006

@ 05:34 PM

ETech 2006 Trip Report: Who Is the Dick on My Site?

These are my notes from the session Who Is the Dick on My Site? by Dick Hardt

This was a talk about the services provided by Sxip Identity which identifies itself as an 'Identity 2.0' company. Dick Hardt started the talk by telling us his name 'Dick' and then showing us images of lots of other people named 'Dick' such as Dick Cheney, Dick Grayson, Dick Dastardly and a bunch of others. The question then is how to differentiate Dick Hardt from all the other Dicks out there on the Web.

In addition, Dick Hardt raised the point that people may have different personas they want to adopt online. He used women as a classic example of multiple persona syndrome given that they constantly change personalities as they change their clothes and hair. He used Madonna as a specific example of a woman who showed multiple personalities. I personally found this part of the presentation quite sexist and it colored my impression of the speaker for the rest of the talk.

So how does one tell a Web site who one is today? This usually involves the use of shared secrets such as username/password combinations but these are vulnerable to a wide variety of attacks such as phishing.

Besides telling sites who I am it would be nice to also have a way to also tell them about me so I can move from site to site and just by logging in they know my music tastes, favorite books, and so on. However this could lead to privacy issues reminiscent of scenes from Franz Kafka's The Trial or George Orwell's 1984. There should be a way to solve this problem without having to deal with the ensuing privacy or security issues. I can't help but note that at this point I felt like I had time warped into a sales pitch for Microsoft Hailstorm. The presentation seemed quite similar to Hailstorm presentations I saw back in 2001.

Dick Hardt then talked about various eras of identity technology on the Web

Identity 1.0 - directory services and X.500
Identity 1.5 - SAML and other technologies that enable business partners to assert identity information about individuals. They require trust between the identity provider and the relying party
Identity 2.0 - user-centric identity models such as InfoCard

Sxip Identity has shipped v1.0 of their technology but has gotten feedback that its customers would like it to be a standard. They have now begun to investigate what it would mean to standardize their solution. One of their customers is Ning who used their technology to add identity management to their site in 12 minutes.

Categories: Trip Report

March 8, 2006

@ 04:32 PM

ETech 2006 Trip Report: Artificial, Artificial Intelligence: What It Is and What It Means for the Web

These are my notes from the Artificial, Artificial Intelligence: What It Is and What It Means for the Web by L. F. (Felipe) Cabrera, Ph.D.

This was a talk about Amazon's Mechanical Turk. The session began with the original story of the mechanical turk which was a hoax perpertrated in 1769 by Wolfgang von Kempelen. The original mechanical turk was a chess playing automaton that turned out to be a powered by a dimunitive human chess master as opposed to 'artificial intelligence'. In a sense it was artificial artificial intelligence.

There are lots of questions computers can answer for Amazon's customers such as "where is my order?" or "what would you recommend for me based on my music/book tastes?" but there are others it cannot. However there are other questions a computer can't answer well today such as "is this a picture of a chair or a table?". Amazon's Mechanical Turk provides a set of web APIs that enable developers harness human intelligence to answer questions that cannot be answered by computers. The service has grown moderately popular and now has about 23,000 people who answer questions asked via the API.

The service offers benefits to developers by giving them a chance to enhance their applications with knowledge computers can't provide, to businesses by offering them new ways to solve business problems and to end users who can make money by answering questions asked via the API.

Examples of businesses which have used the API include a translation service that uses the API to check the accuracy of translations, polling companies testing opinion polls and purchasers of search advertising testing which search keywords best match their ads.

Categories: Trip Report

March 8, 2006

@ 03:16 AM

ETech 2006 Trip Report: The Future of Interfaces Is Multi-Touch

These are my notes from the session The Future of Interfaces Is Multi-Touch by Jeff Han.

These was mainly a demo of touch screen computing with a screen that supported multiple points of contact. Touchscreens and applications today only support a single point of contact (i.e. the current location of the mouse pointer). There is a lot of potential that can be explored when a touchscreen that supports multiple points of contact at once is used. For example, truly collaborative computer usage between multiple people is possible.

Most of the content of the talk can be gleaned from the MultiTouch Interaction Research video on YouTube. Since I had already seen the video, I zoned out during most of the talk.

Categories: Trip Report

March 8, 2006

@ 03:03 AM

ETech 2006 Trip Report: Simple Bridge-building

These are my notes from the Simple Bridge-building session by Ray Ozzie.

Ray started of by talking about how he sees RSS as having the potential to be the connective tissue between web sites. The increased usage of RSS and mashups are part of a trend of building 'composite applications' on the Web. Although Microsoft and other tools vendors have done a good job selling tools to enterprises for building composite applications, it has been RSS and mashups that have brought these trends to power users and hobbyist developers. Ray believes the next step is bringing the power of composite applications to end users. He explained that this idea isn't far fetched given that UNIX pipes are a manifestation of composite applications surfaced at the end user level.

The Web today is primarily a bunch of web sites which act as data silos. Although there has been some progress made on the data interchange front with XML and microformats, we don't have something as simple and powerful as the clipboard for the Web. So Ray talked to his Concept Development team at Microsoft and asked them to implement a clipboard for the Web that was cross-browser and secure. After a few weeks they came up with a JavaScript-based solution which worked in major browsers like Internet Explorer and Firefox. In addition, the solution was XML-based and harnessed microformats. They christened the technology Live Clipboard.

So how does it work? Basically when a user right-clicks on a special Live Clipboard icon on a website they see the regular right-click menu containing Cut, Copy, Paste, and Select All. However when the information is copied, it is copied to the clipboard as an XML document containing a rich payload which could be a microformat. This enables a bunch of useful scenarios such as cutting and pasting rich data between websites (e.g. events from Eventful to my calendar in Windows Live Mail) or between websites and desktop applications (e.g. events from Eventful to my calendar in Microsoft Outlook).

Ray then proceeded to give a series of demos of the Live Clipboard technology. The Concept Development team has created a screencast of a Live Clipboard demo , and a simple web page-based demo that you can test.

More information about Live Clipboard can be obtained from Ray Ozzie's blog post entitled Wiring the Web.

Categories: Trip Report

March 7, 2006

@ 07:56 PM

ETech 2006 Trip Report: Scaling Fast and Cheap - How We Built Flickr

These are my notes on the Scaling Fast and Cheap - How We Built Flickr session by Cal Henderson.

This was an 8 hour tutorial session which I didn't attend. However I did get the summary of the slide deck in my swag bag. Below are summaries of the slide deck Cal presented at the tutorial.

Overview and Environments
Flickr is a photo sharing application that started off as a massively multiplayer online game called Game Never Ending (GNE). It has 2 million users and over 100 million photos. The site was acquired by Yahoo! in May 2005.

A key lesson they learned is that premature optimization is the root of all evil. Some general rules they've stuck to is

buy commodity hardware
use off-the-shelf software instead of building custom code

When trying to scale the site there were a number of factors that needed to be considered. When buying hardware these factors included availability, lead times, reliability of vendors, and shipping times. Other factors that affected purchase decisions included rack space, power usage, bandwidth, and available network ports in the data center.

Load balancing adds another decision point to the mix. One can purchase an expensive dedicated device such as a Cisco 'Director' or a Netscale device or go with a cheap software solution such as Zebra. However the software solution will still require hardware to run on. One can apply several load balancing strategies both at the layer 4 network level such as round robin, least connections and least load and at the layer 7 network layer by using URL hashes. Sites also have to investigate using GSLB, AkaDNS and LB Trees for dealing with load balancing at a large scale. Finally, there are non-Web related load balancing issues that need to be managed as well such as database or mail server load balancing.

The Flickr team made the folowing software choices

PHP 4 (specifically PHP 4.3.10)
Linux (2.4 kernel on x86_64 and 2.6 kernel on i386)
MySQL 4/4.1 with InnoDB and Character sets
Apache 2 with mod_php and prefork MPM

There were 3 rules in their software process

Use source control
Have a one step build process
Use a bug tracker

Everything goes into source control from code and documentation to configuration files and build tools.

For development platforms they chose an approach that supports rapid iteration but enforces some rigor. They suggest having a minimum of 3 platforms

Development: Working copy of the site which is currently being worked on
Staging: Almost live version of the site where changes to the live site are tested before deployment
Production: The customer facing site

Release management consists of staging the application, testing the app on the staging site then deploying the application to the production servers after successful test passes.

Everything should be tracked using the bug tracker including bugs, feature requests, support cases and ops related work items. The main metadata for the bug should be title, notes, status, owner and assigning party. Bug tracking software ranges from simple and non-free applications like FogBugz to complex, open source applications like Bugzilla.

Consistent coding standards are more valuable than choosing the right coding standards. Set standards for file names, DB table names, function names, variable names, comments, indentation, etc. Consistency is good.

Testing web applications is hard. They use unit testing for discrete/complex functions and automate as much as they can such as the public APIs. The WWW::Mechanize library has been useful in testing Flickr

Data and Protocols
Unicode is important for internationalization of a site. UTF-8 is an encoding [not a character set] which is compatible with ASCII. Making a UTF-8 web application is tricky due to inconsistent support in various layers of a web application; HTML, XML, JavaScript, PHP, MySQL, and Email all have to be made to support Unicode. For the most part this was straightforward except for PHP which needed custom functions added and filtering out characters below 0x20 from XML files [except for normalizing carriage returns]. A data integrity policy is needed as well as processes for filtering out garbage input from the various layers of the system.

Filtering bad input doesn't just refer to unicode. One also has to filter user input to prevent SQL injection and Cross Site Scripting (XSS) attacks.

The ability to receive email has been very useful to Flickr in a number of scenarios such as enabling mobile 'blogging' and support tracking. Their advice for supporting email is to leverage existing technology and not write an SMTP server from scratch. However you may need to handle parsing MIME yourself because support is weak in some platforms. For Flickr, PEAR's Mail::mimeDecode was satisfactory although deficient. You will also have to worry about uuencoded text and Transport Neutral Encapsulation Format (TNEF) which is only used by Microsoft Outlook. Finally, you may also have to special case mail sent from mobile phones due to idiosyncracies of wireless carriers.

When communicating with other services, XML is a good format to use to ensure interoperability. It is fairly simple unless namespaces are involved. The Flickr team had to hack on PEAR's XML::Parser to make it meet their needs. In situations when XML is not performant enough they use UNIX sockets.

When building services one should always assume the service call will fail. Defensive programming is key. As a consequence, one should endeavor to make service calls asynchronous since they may take a long time to process and it makes callers more redundant to failure.

Developing and Fixing
Discovering bottlenecks is an important aspect of development for web applications. Approaches include

CPU usage - rarely happens unless processing images/video, usually fixed by adding RAM
Code profiling - rarely causes problems unless doing crypto
Query profiling - usually fixed by denormalizing DB tables, adding indexes and DB caching
Disk IO - usually fixed by adding more spindles
Memory/Swap - usually fixed by adding RAM

Scaling
Scalability is about handling platform growth, dataset growth and maintainability. There are two broad approaches to scaling; Vertical scaling and Horizontal scaling. Vertical scaling is about buying a big servers, to scale one buys even bigger servers. Horizontal scaling is about buying one server and to scale one buys more of the same kind of server. In todays world, Web applications have embraced horizontal scaling. The issues facing services that adopt horizontal scaling are

increased setup/admin cost
complexity
datacenter issues - power / space / available network ports
underutilized hardware - CPU/Disks/Mem may not be used to full capacity

Services need to scale once they hit performance issues. When scaling MySQL one has to worry about

Choosing the right backend - MyISAM, BDB, InnoDB, etc
Replication
Partitioning/Clustering
Federation

One big lesson learned about database scalability is that 3^rd normal form tends to cause performance problems in large database. Denormalizing data can give huge performance wins.

The rest of the slides go on about case studies specific to Flickr which are interesting but I don't feel like summarising here. :)

Categories: Trip Report

October 9, 2005

@ 02:09 AM

Web 2.0 Conference Trip Report: What the Teens Want, Featuring a Panel of Teenagers

The panle on what teens want was hosted by Safa Rashtchy who asked questions of 5 teenagers. There were three males and two females.

Earlier in the day, I was chatting with Mike and I pointed out that all through th econference I hadn't heard mention of the kind of Web apps that excite the younger generation. I hadn't heard MySpace mentioned once, and the only times instant messaging came up was in the context of Skype being sold to eBay for $4 billion. The Web 2.0 conference seemed dedicated to applications mostly of interest to the twenty five and over crowd.

This changed during the session when Safa Rashtchy questioned the teenagers about various aspects of their computer usage. The notes summary below is mainly from memory since I didn't take notes during this session.

Three out of five of the teenagers used MySpace. One of them said he spends all his free times waiting for comments to his space. Another teenager said she had stopped using MySpace when she went to college because it was too "high school" and now she used Facebook which was more college oriented.

One of the teenagers said he spent up to $50 a month on ring tones. Four of them had iPods and all of them rarely [if ever] paid for music. It seemed thay had all tried the iTunes Music Store at one time or the other but eventually succumbed to the lure of file sharing networks.

They all used AOL Instant Messenger and one other IM client. Two used MSN Messenger mainly because they had friends outside the US (Mexico & Brazil) and MSN Messenger is very popular outside the US. One or two used Yahoo! Messenger. None of them used Skype and in fact they sounded like they had never heard of it. They didn't seem interested

They all used Google for search.

Two of them had used eBay but worried about being ripped off online.

When asked what kind of applications they'd like to see on the Web. They asked for "more free stuff" and "get rid of spyware".

The most amusing part of the session was when Safa was trying to find out what eCommerce sites they'd visit. He first asked where they'd buy a cellular phone and each kid said they'd go to the website of their current cellular service provider. Then Safa tried another tack and asked where they'd buy a CD player online and the first kid went "CD Player?" with the same tone of voice and expression on his face I'd have if asked where I could buy a record player online. The audience found this hilarious.

PS: This panel is almost identical to a similar one at the Microsoft Research Social Computing Symposium 2005 held earlier this year. MSR has a video of that panel available online.

Categories: Trip Report

October 9, 2005

@ 12:01 AM

Web 2.0 Conference Trip Report: A Conversation with Sergey Brin

After lunch on friday, there was a surprise session. John Battelle announced that he was going to have a conversation with Sergey Brin. Throughtout the interview Sergey came off as very affable and it's easy to see how he can tell his employees that their corporate motto is "Do No Evil" without them questioning its naiveté.

John Battelle started off by asking "It's been a long strange trip to where you are today, how's your head?". Sergey responded that they were very fortunate to have started at Stanford. Being in Silicon valley turned out to be very helpful and influential to the course his life has taken today. When he and Larry first started Google they had planned to open source the Google code. The main reason they decided to start a company was because they needed money to purchase the significant computing resources that the Google search engine needed.

John Battelle then asked Sergey to respond to Terry Semel's comments from the previous day that Google is an extraordinary search engine but as a portal they probably rank as number 4. Sergey responded by jokingly stating that although their cafeteria is nice and they keep trying to improve the quality of the food, they aren't in the top 10 or top 100 restuarants in the world. This elicited loud laughter from the audience.

John followed up by asking Sergey what he thought of the comments by Yusuf Mehdi of Microsoft that they are now the underdog. Sergey replied that he is very excited that Google is considered a leader in terms of technology. He knows they may not be number 1 when it comes to big business deals or creating huge platforms like Microsoft but they are definitely a technology leader.

John then asked Sergey whether he felt any pressure due to their high share of the Web search market and high stock market valuation. Sergey said he wasn't a valuation expert so he couldn't comment on that. As for high market share in the search market, he is glad that so many people use their search engine based on word of mouth. It shows they have built a quality product. They have some promotional partnerships but for the most part their market share has grown due to the great search experience they provide.

The next question from John was whether Google would keep the clean look on their search page. The response from Sergey was that they will continue with that look on their front page but there will arise the need for other kinds of products from Google. For example, GMail arose out of the need for a better user experience around Web mail. Not only have they improved the web mail experience for GMail users but they have also bettered the lot of users of competing services since competitive responses have increased the mail quota size on various services by 100 times or more.

John began his next question by bringing up a topic that had been an undercurrent in various conversations at the conference. Google has become the new Microsoft, in that they are the 800 lb. gorilla that enters markets and takes them over from existing players. John gave the specific example that the newly launched Google Reader has now scared vendors of web-based RSS readers. Sergey responded by pointing out that when Google enters markets it usually leads to good things for existing parties in the market because small companies get bought and new companies get funded. He used GMail as an example of the entrance of Google into a market leading to a flurry of positive M&A activity. Secondly, Sergey stated that some of their offerings are intended to benefit the Web at large. He said they created AdSense as a way for Web publishers to make money and stay in business. Google had become concerned that a lot of web publishers were going out of business which meant less content on the Web which was bad for their search engine.

The questions from John Battelle ended and a Q&A session with the audience began.

The first question asked was about the rumored office suite being developed by Google. Just like Ray Ozzie and Jonathan Schwartz had done when asked the question, Sergey said he didn't think that it made sense to simply port outdated ideas like the mini-computer to the Web. The audience laughed at the comparison of Microsoft Office to the mini-computer. Sergey did say that Google would likely be creating new kinds of applications that solved similar problems to what people currently use traditional Office suites to solve.

The next question from a member of the audience was whether Sergey thought that click fraud was a big problem for Google. Sergey felt that click feaud wasn't a big problem for Google. He said that like credit card companies they have lots of anti-fraud protections. Additionally their customers calculate their ROI on using Google's services and know they get value. Finally, he added that the algorithms that power their advertising engine are fairly complex and not easy to game.

Continuing with the "Google as the new Microsoft" meme, the next question from a member of the audience was what markets did Google not plan to enter in the near future so VCs could tell where was safe to invest. Sergey joked that he thought the various markets entered were good investments. His serious response was that Google is a very bottom up company, and their engineers usually end up deciding what becomes products instead of directives from the executives. John Battelle then jumped in and asked if the company wasn't being directed in its recent offerings then how come most of the offerings seem to be echoing the offerings found in traditional portal sites. Sergey's response was that it was probably because Google's engineers wanted to build better products than the existing offerings in the market place.

I asked Sergey that given Terry Semel's comments that search only accounts for about 5% of page views on the Web while content consumption/creation and communications applications made up 40% of user page views each, what was Google's vision for communications and content related applications. Sergey said that Google definitely plans to improve the parts of the Web where people spend a lot of their time which is part of the motivation for them shipping GMail.

Categories: Trip Report

October 8, 2005

@ 08:37 PM

Web 2.0 Conference Trip Report: From the Labs - Yahoo!

The second From the Labs was presented by Usama Fayyad and Prabhakar Raghavan.

The presentation started by listing a number of Yahoo!'s recent innovative product releases such as Yahoo! 360^o, Yahoo! MyWeb 2.0, Yahoo! Mail beta, Yahoo! Music
Engine and Yahoo! Messenger with voice. Yahoo! launches 100s of products a year and recently started spending more resources on research. They want to create a science to explain the various characterisitic of the Web which they can use to build innovative products.

So far they have launched the Yahoo! Tech Buzz Game which was launched at the O'Reilly Emerging Technology Conference earlier this year. It is a fantasy prediction market for high-tech products, concepts, and trends. They also demoed Yahoo! Mindset which enables you to sort search results by your intent. The example scenario on the website is being able to sort search results for terms like "HDTV" based on whether you are doing research or trying to buy something. This is something Robert Scoble was recently asking for in his blog as the next generation in search. This is very impressive if they can actually scale it out to be more than demoware

Finally they showed off an application called Tagline which was a visual representation of popular photos and tagging trends in Flickr over time. It was a very flashy looking application but I couldn't see what the practical uses could be.

Categories: Trip Report

October 8, 2005

@ 08:23 PM

Web 2.0 Conference Trip Report: From the Labs - Google

The final From the Labs was presented by Alan Eustace and Jason Shellen.

Alan began by stating that google is focused on innovation which is why the have small teams [to promote spontaneity] and give their 20% time to allow engineers free to pursue projects they are passionate about. They demoed two recent efforts

The first was an image recognition engine that could identify the sex of people by their faces using machine learning. They had trained it with over 2 billion images and it's accuracy had gotten up to 90%. The long term goal is to enable scenarios such as 'identify the people in this picture' and then other pictures with this person in them'. That would be very cool if they can actually get that to work correctly.

The second project that was demoed was the Google Reader. Actually, it wasn't really demoed. It was announced. Like everyone else I tried to use it by navigating to the site but it was so abominably slow, I gave up.

Categories: Trip Report

October 8, 2005

@ 07:42 PM

Web 2.0 Conference Trip Report: From the Labs - Hewlett Packard Laboratories

The first From the Labs was presented by Gene Becker.

Gene started off by asking how many people in the audience were growing board with the traditional computing interface of keyboard, mouse and monitor. He called the current computing interface a kazoo when we really need a virtuoso violin. HP Labs is fovused on utility and ubiqitous computing. The Web has become increasingly social, diverse, mobile, creative, experiential, contextual, and physical. HP Labs is designing software and hardware for this new world.

He showed a number of interesting developments from HP Labs such as physical hyperlinks, media scapes - digital landscapes overlayed over physical locations by combining gps + wireless + audio + ipaqs, the misto table - an interactive digital coffee table, and virgil - a context aware extensible media driven browser. Gene also mentioned that HP has been making strides in utility computing by renting out their grid networks to animators such as DreamWorks SKG. Their grid has also enabled independent animators to have access to large-scale render farms that would traditionally be out of their price range.

Categories: Trip Report

October 8, 2005

@ 07:17 PM

Web 2.0 Conference Trip Report: Open vs. Closed Models

I attended the discussion on open versus closed models which was hosted by Danny Rimer Jeff Barr, Toni Schneider and Sam Schwartz.

Tim O'Reilly began the session by talking about openness and how this is a central theme of Web 2.0. However he pointed out that at the end of the day to have value a company must own something. He then asked the various members of the panel what their company owned.

Jeff Barr said that although Amazon has open APIs they do own their customer database, the buying experience, as well as the procurement and shipping process. Toni Schneider responded that Yahoo! wants to own the user's trust so that users have no qualms about placing their data in Yahoo!'s services. Tim then asked if Yahoo!'s users could export the data they have in Yahoo!'s services. Toni responded that there were ways to get data out of Yahoo!'s services and this was mostly provided based on customer demand. For example, one can export photos uploaded to Flickr but this reduces their worth since the user loses the network effects from being part of the Flickr ecosystem.

Tim's next target was Danny Rimer who he asked whether Skype wasn't as proprietary as AOL Instant Messenger since it's IM protocol wasn't based on open standards like Jabber. Danny responded that although the IM protocol is closed they did have a client-side API. He also stated that the main reason the VOIP protocol isn't more open is that they are still working out the kinks. However he did note that the Skype API hasn't gotten a lot of traction.

Tim O'Reilly then asked the participant where they resided on the continuum of control versus openness. Sam Schwartz mentioned that there was a delicate balance between the old school and new school of thought at Comcast. Toni said Yahoo! believes in opening up their platform which is why they created YSDN, however this doesn't mean throwing things over the wall without support. Tim O'Reilly stated that it seemed Yahoo! seemed more intent on controlling things since the primary list of Yahoo! Maps mashups is hosted on a Yahoo! site while the primary list of Google Maps mashups is not hosted on a Google owned website. So it seems Google's API efforts are more community driven than Yahoo!'s. Toni responded by saying that YSDN is a good first step for Yahoo!. They aren't just about enabling people to put data in their system but also enable them to get it out as well.

As someone who has had to drive developer efforts at Microsoft, first for the XML team and now at MSN, it is a very delicate balance between enabling the community and dominating it. Unlike Tim, I interpret Yahoo!'s efforts as highlighting the efforts of their developer community and also I'd point out that Google does the same as well. It seems weird to criticize companies for highlighting the efforts of people using their platform.

Tim the asked what VC's like him were looking for in today's startups. Danny replied that he is now primarily interested in companies for geographies outside the United States such as China and Isreal. Sam responded that Comcast is looking for people and services who used a lot of broadband resources.

Tim followed up by asking about business models, it seemed to him that the goal of a lot of startups such as Toni's Oddpost was to be bought by a big company. Toni agreed that a lot of people were building applications without a business model. it was also argued by the group that we need better business models for Web 2.0 besides affiliate programs like those used by Amazon and eBay. Danny argued that it isn't a bad thing if what ends up happening is that new startups end up being fuel for innovation at big companies by being purchased. My assumption is that since he's a VC he gets paid either way. ;)

Categories: Trip Report

October 8, 2005

@ 06:53 AM

Web 2.0 Conference Trip Report: Can Open Source Stay Open?

The session on Open Source and Web 2.0 was an interview of Mitchell Baker and Jonathan Schwartz by Tim O'Reilly.

By the end of this talk I was pretty stunned by some of the things Jonathan Schwartz said. He sounded like a dot bomb CEO from 1999. If I had any SUNW stock, I definitely would have ditched it after the talk.

Tim O'Reilly began the session by stating that the title of the talk was misleading. He asked for a show of hands how many people used Linux and then for how many people used Google. A lot more hands showed up for Google users than Linux users. Tim said that while people tend to limit their idea of Linux applications to desktop software like the Gimp, Google is probably the most popular Linux app. So the discussion is really about relationship between Open Source and Web 2.0.

Tim began the interview by asking Jonathan Schwartz what he felt was fresh about Sun's Open Source strategy. Jonathan said that the key thing behind Open Source's rise isn't the availability of source code but because it is available for free (as in beer). This is what is so cool about Google, they provide their services online for free which increases their reach. Sun is embracing this notion to increase the usage of its software.

The then asked Jonathan to talk about Sun's grid computing efforts. Jonathan said they recently moved to a self service model for their computing grid. Customers no longer need to sign contracts up front, instead customers just need to go to a webpage on Sun's website and have their credit card ready. Tim O'Reilly commented that customer self service is one of the pillars of Web 2.0. Since Sun moved to this model they have sold out their grid services, primarily to Texas oil & gas companies wanting to run simulations on about Hurricane Rita. Sun's goal is for their grid to target the long tail in which case Stanford college students working on their Ph.D's and the like may become their primary customers.

The previous night Tim O'Reilly had asked Ray Ozzie if he felt new revenue models such as ad-supported Web-based software would make as much money as old revenue models such as selling shrinkwrapped software. Tim continued with this theme by asking Jonathan if he thought that Sun's new revenue model of renting out their grid would bring in as much money as their old model of selling hardware. Jonathan said their internal models show that the grid business will be very profitable for them. At this point Mitchell Baker jumped into the conversation to add that the old models currently suffer from needing a control structure which eats into revenue. Controls such as DRM, anti-piracy measures, EULAs, etc add cost to existing business models and once we move to more open models based on customer self service the savings will be huge.

Tim then asked Mitchell whether she thought that Firefox's trump card was the fact that anyone can modify the application to meet their needs since it is open source and customizable. Mitchell replied that it was less about source code availability and more about the culture of participation within the Firefox community.

Tim then asked Mitchell if she felt that Greasemonkey would be widely adopted. Mitchell said she thought that would be extremely unlikely. She pointed out that the average user already is confused by the difference between their web browser and a web page let alone adding something as complex as Greasemonkey into the mix. I have to agree with Mitchell here, I recently found out that a surprising number of end users navigate the Web by entering URLs into search boxes on various web search engines instead of using the address bar of their browser. The web is already too confusing to these users let alone 'remixing' the Web using applications like Greasemonkey.

A number of times while he was speaking, Tim O'Reilly gave the impression that extensions like Greasemonkey are examples of Firefox's superiority as a browser platform. I completely disagree with this notion, and not only because Internet Explorer has Greasemonkey clones like Trixie and Turnabout. The proof is in the fact that the average piece of Windows spyware actually consists of most of the core functionality of Greasemonkey. The big difference is that Firefox has a community of web developers and hobbyists who build cool applications for it while most of the folks extending Internet Explorer in the Windows world are writing spyware and other kinds of malware.

It isn't the availability of the source that's important. It's the community around the application.

The next question was for Jonathan and it was about the recent announcement between Sun and Google. Jonathan's started by stating that although many people wanted the announcement to be about an AJAX Office suite that wasn't on the horizon. He said the deal was about distribution and communities which are very important. He pointed out that there are a number of widely distributed and near ubiqitous platforms such as Flash and Java which aren't Open Source. Having a wide distribution network with Java deployed on many desktops meant that one could automatically download new applications such as a VOIP client or Toolbar application on any desktop with Java or StarOffice installed. Mitchell jumped in to point out that well-distributed but lousy products don't work. She went on to add that the new distribution model is no longer about being distributed with the OS but instead is powered by word of mouth on the Web. Firefox has gotten 90 million downloads with no traditional distribution mechanisms.

Tim asked Mitchell whether there would be a return to the browser wars of the nineties where Netscape and Microsoft one-upped each other with incompatible, proprietary new features on a regular basis. Mitchell said there were two things wrong with the browser wars in the nineties; the war itself which led to incompatibilities for web developers and Netscape's defeat which led to stagnation of the Web browser. Mitchell said that Firefox will innovate with new features but they plan to ensure that these features will not be incompatible with other browsers or at least will degrade well in them.

Tim asked Jonathan what was behind the thinking that led him to becoming one of the most senior regular bloggers in the technology industry? Jonathan replied that he believes very strongly in community. He felt that developers don't buy things, they join movements. In this case, Sun's transparency is a competitive weapon. This is especially true when they can't compete with $500 million to $1 billion marketing budgets of companies like Microsoft and IBM.

Tim asked whether Jonathan's the blog is always transparent and he never attempts to mislead or provoke. Jonathan said that he definitely provokes but never misdirects. Even then the legal department at Sun doesn't read his entries before he posts them although a bunch of lawyers now have him on their speed dial and often ask him to include disclaimers in his posts.

Tim then asked Mitchell whether the large number of Google employees working on Firefox caused problems since the company is notoriously secretive. Mitchell responded by pointing out that there are people from lots of different companies working on Firefox, it's just that the Google folks get the most press. All the Google folks are still active on the core of the browser and they know that anything that goes into the core must be open for discussion. She stated that if they began to be secretive about code that would be shipping in the core of the browser then they'd be asked to put those changes in extensions instead.

The questions ended and the Q & A session began. I asked a question each of Mitchell and Jonathan.

My question for Mitchell was that given that the rise of AJAX is primarily because Firefox copied the XMLHttpRequest from Internet Explorer, was there a policy of keeping abreast of innovations in IE. "Not always", was Mitchell's answer. On the one hand they did copy XMLHttpRequest but on the other hand they didn't clone ActiveX even though they took a lot of heat for not doing so. given all the security woes with ActiveX she felt that in retrospect they had made the right decision.

My question for Jonathan was why he dismissed the idea of an AJAX Office suite earlier during the talk. Jonathan said he thought that in some cases not every application transferred well to the Web as an AJAX application. He gave examples of Firefox and Photoshop as applications that wouldn't make sense to build as AJAX applications.

Another member of the audience asked what Sun had learned from Netscape's open sourcing of Mozilla in their efforts. Jonathan replied that everything Sun produces from now on will be Open Source. He encouraged companies to join the Open Source community since he saw no down side. His goal was to get as wide a distribution as possible and then figure out how to give value to their shareholders after that.

Categories: Trip Report

October 7, 2005

@ 12:08 AM

Web 2.0 Conference Trip Report: A Conversation with Terry Semel

I attended the session where Tim O'Reilly interviewed Terry Semel, who is the chairman and CEO of Yahoo! Inc. I was very impressed with Terry Semel's talk and it very much cemented for me that Yahoo! will be a company to watch over the next few years. I hope my notes do his words justice.

Terry joined Yahoo! after spending years in Hollywood because he wanted a change and saw the immense opportunity for advertising on the Web. Tim O'Reilly pointed out that some members of the mainstream media have been worried about some of Yahoo!'s media moves such as the recent hiring of Lloyd Braun, former Chairman of ABC Entertainment Television Group, to run Yahoo!'s Media Group. Tim asked if Terry was trying to turn Yahoo! into the interactive studio of the future.

Terry said he isn't sure what exactly Tim meant by an "interactive studio". Terry pointed out that when he was in Hollywood, they cared about two things; content and distribution. Technology wasn't a big deal, it was something that changed every decade or so that the studios could take advantage of. In the 21st century, Terry believes there are now three pillars of media; content, distribution and technology. Yahoo! is in distribution since it reaches over 400 million people. Yahoo! delivers content. Yahoo! is a technology company. Yahoo! is a 21st century technology company that drives great media.

Tim then asked whether the fact that Yahoo! hires reporters to doesn't put them in conflict with traditional news organizations like CNN or the LA Times. Terry responded that Yahoo! is all about content. Sometimes it is user generated content. Sometimes it is licensed content from a media company. And finally sometimes it is content created by Yahoo! as they experiment with discovering the future of content generation. Yahoo! wants to take a leadership role in redefining the nature of content generation. Terry gave the example of travel reviews and how often it is more authentic to obtain user generated content and photos about a trip than a professional travel reporters opinion. Yahoo! is trying to enable all those scenarios with their various offerings.

Tim then asked how Yahoo! can reconcile the difference between their role as a service provider with being a news organization. Tim brought up the recent incident where Yahoo! turned over information about a Chinese dissident which led to him being jailed. Tim argued that traditional news organizations would not have given up the information. Terry began by clarifying that 99% or more of Yahoo!'s news content is syndicated from other sources and they aren't a news organization. Secondly, any organization that operates in China has to observe the riles and regulations of the land. This doesn't just apply to China but for any country a company does business in from the EU to the United States. Although some of these laws may be unsettling to him, the fact is that there are the laws in those lands and everyone who lives in and/or operates a business in these countries knows the law. Terry also does not agree with the opinion that Western companies shouldn't do business with China. Exposing Chinese audiences to Western cultures is much better than a policy of isolation. After all, who would have thought that the Berlin Wall and the Iron Curtain would fall?

Tim asked what Terry thought about Google. Terry gave Google credit for doing great work in search. Yahoo! just got into it 18 months ago, and other companies such as Microsoft are coming along as well. If he were Google he would worry about the fact that only 5% of page views on the Web are from search yet the account for about 40% of the revenue generated on the Web. Google realizes that they have to diversify and become a portal which is why they now have offerings like maps, shopping, email, customizable homepages, etc. However since they are becoming a portal one should rate them as a portal and not just as a search engine. And as a portal, they would probably rank fourth behind portals such as Yahoo! and MSN. Yahoo! has a number of superior offerings from shopping to mail. Yahoo!'s revamped email offering was rated as superior to GMail by the Wall Street Journal and Yahoo! Mail has 10 times the user base of GMail.

The fact is that people spend 40% of their time online consuming content and another 40% in communications programs. There seems to be a great opportunity to monetize this time Web user's spend online that hasn't been seized yet. Yahoo! plans to seize this opportunity

Tim asked what Terry thought about the fact that the stock market is rating Google higher than Yahoo! when it comes to market capitalization. Terry stated that the market is currently focused on search and the revenue from search but there are avenues for deeper engagement with end users with richer opportunities when it comes to communications and content.

Tim then asked if Terry would consider giving Google access to the information in HotJobs if they launched jobs.google.com? Terry responded that Yahoo! is more of an open platform company than Google and has deeply embraced syndication technologies like RSS.

There was a brief Q & A after the interview.

I asked Terry what he felt was Yahoo's biggest strength and Google's biggest weakness. I also pointed out that I agreed with his statement that as portals go Google is #4 which prompted him to ask if I worked at Yahoo!, when I responded that I worked at MSN, this seemed to make the audience laugh. Terry responded that he doesn't talk about the weaknesses of his competitors in public BUT he would say that Yahoo's strength is that they have assets that encourage deep consumer engagement but with regards to communication and content.

Another audience member asked whether Yahoo! considered user generated content to be important. Terry stated that they did which is why they have offerings that enable people to share their experiences with others such as Flickr and Yahoo! 360^o. Terry stated that people tend to polarise discussions such as when they ask whether to bet on branded advertising or sponsored ads in search? It isn't an either-or situation. It's like asking a parent to pick which of their genius kids is the favorite. Can't they love both? If sponsored search ads are huge then Yahoo! benefits. If branded advertising becomes huge then Yahoo! benefits as well.

Categories: Trip Report

October 6, 2005

@ 04:04 PM

Web 2.0 Conference Trip Report: Web 2.0 Dinner - A Conversation with Microsoft

I attended the Web 2.0 dinner hosted by Ray Ozzie , Gary Flake, and Yusuf Mehdi.

During the dinner there was a Q & A session with the Microsoft folks with John Battelle and Tim O'Reilly asking the questions. My notes below are heavily paraphrased since I took the notes on my phone.

Q: Over the past year the sentiment is slowly becoming stronger that Microsoft isn't the dominant player it once was. How does it feel not to be the big dog?

A: Yusuf - It's great to be the underdog.

Q: How do you feel about some the new contenders in today's market?

A: Ray - The software industry is changing and Microsoft will have to adapt. Old business models are giving way to new ones and we will have to pay attention to them.

Q: MSN used to be a discarded group within Microsoft but now it is getting a lot of focus. How will MSN make a difference?

A: Gary - Web 2.0 is about software ecosystems and developer platforms. Microsoft and hence MSN, has a lot of experience when it comes to fostering developer platforms and software ecosystems.

Q: What do you think of the fact that big money makers like Office & Windows are imperilled by Web-based offerings and the new markets may not be as profitable as existing ones?

A: At Microsoft we are big believers in the value of Web-based services and the new business models they present. However we are also investing in our existing products like Office and Windows.

Q: When will we see office on the web?

A: Ray - It is a process. It makes sense to move some stuff to the Web such as email but other applications such as graphics editing will likely be on the desktop for a while. We are still figuring things out how to strike the right balance.

Q: Will we see rapid development from microsoft?

A: Ray/Yusuf/Gary - It depends on the product. MSN ships software in timeframes measured in months. Other parts of Microsoft are in years. However how long something takes to ship really is a function of the complexity and maturity of the application. The more complex it is, the longer it takes to ship. After all, Netscape used to ship a new browser every couple of months back in the early days of the Web. Now things are a bit different.

Q: Is groove technology going to show up in Microsoft products?

A: Ray - It's part of office and will go through all the things needed to make it part of the suite including consistent UI with other office apps and localization in dozens of languages. However the model of a smart client that harnesses the power of the network will permeate across the company.

Q: What about new revenue models such as ad-supported offerings?

A: Ray - The new paid search model was pioneered by overture and perfected by google. However ee as an industry still haven't fully figured exactly how much software can be ad-supported vs. paid.

Q: Are you guys buying AOL?

A: No comment.

Q: You recently launched MSN AdCenter, how do you plan to get new advertisers? Will it be due to access to high traffic MSN sites or by undercuting prices?

A: Yusuf - It'll be a little of both. There definitely will be something of a bonanza when it comes to purchasing keywords when we launch the system but in the long term the value of our ad network will be the high traffic from our sites.

Q: What assets does microsoft have that give it an edge?

A: Ray - We work better together across devices from mobile, PC, etc. we also have the experience of having both consumer offerings and business offerings.

Q: Will there be a free ad-supported version of Office? Perhaps this is the answer to piracy in emerging markets?

A: Gary - Show off hands, how many people want an ad-based office suite? How many people don't? [ed note -- a seemingly even number of people raised their hands both times]. See our problem? Ray - Ads aren't always the answer. It is one thing to show a user ads when they are looking for a product in a search engine and quite another to shove ads in their face when they are creating a document.

Q: In the past, Microsoft has locked user's in with proprietary Office formats such as .doc and .xls. Will the next versions of Office support open formats?

A: Ray - There have been two big changes in the next version of Office. The first is that the user interface has been totally revamped while the other is that the file formats are now open XML formats with a liberal license.

Q: Ray, are you having fun at Microsoft?

A: Ray - Yes!!! I thrive in startups and building ideas from scratch but sometimes my ideas are bigger than I can implement and now with Microsoft I have access to lots of resources. I am particularly impressed with the mobile platform, if you haven't checked out Windows Mobile 5.0, you should give it a look.

Categories: Trip Report

October 6, 2005

@ 03:26 PM

Web 2.0 Conference Trip Report: Mash-ups 2.0 - Where's the Business Model?

I attended the panel on business models for mash-ups hosted by Dave McClure,
Jeffrey McManus, Paul Rademacher, and Adam Trachtenberg.

A mash up used to mean remixing two songs into something new and cool but now the term has been hijacked by geeks to means mixing two or more web-based data sources and/or services.

Paul Rademacher is the author of the Housing Maps mash-up which he used as a way to find a house using Craig'sList + Google Maps. The data obtained from Craig's List is fetched via screen scraping. Although Craig's List has RSS feeds, they didn't meet his needs. Paul also talked about some of the issues he had with building the site such as the fact that since most browsers block cross-site scripting using XMLHttpRequest then a server needs to be set up to aggregate the data instead of all the code running in the browser. The site has been very popular and has garnered over 900,000 unique visitors based solely on word-of-mouth.

The question was asked as to why he didn't make this a business but instead took a job at Google. He listed a number of very good reasons

He did not own the data that was powering the application.
The barrier to entry for such an application was low since there was no unique intellectual property or user interface design to his application

I asked whether he'd gotten any angry letters from the legal department at Craig's List and he said they seem to be tolerating him because he drives traffic to their site and caches a bunch of data on his servers so as not to hit their servers with a lot of traffic.

A related mash-up site which scrapes real estate websites called Trulia was then demoed. A member of the audience asked whether Paul thought the complexity of mash-ups using more than two data sources and/or services increased in a linear or exponential fashion. Paul said he felt it increased in a linear fashion. This segued into a demo of SimplyHired with integrates with a number of sites including PayScale, LinkedIn, Job databases, etc.

At this point I asked whether they would have service providers giving their perspective on making money from mash-ups since they are the gating factor because they own the data and/or services mash-ups are built on. The reply was that the eBay & Yahoo folks would give their perspective later.

Then we get a demo of a Google Maps & eBay Motors mash-up. Unlike the Housing Maps mash-up, all the data is queried live instead of cached on the server. eBay has dozens of APis that encourage people to build against their platform and they have an affiliates program so people can make money from building on their API. We also got showed Unwired Buyer which is a site that enables you to bid on eBay using your cell phone and even calls you just before an auction is about to close. Adam Trachtenberg pointed out that since there is a Skype API perhaps some enterprising soul could mash-up eBay & Skype.

Jeffrey McManus of Yahoo! pointed out that you don't even need coding skills to build a Yahoo! Maps mash-up since all it takes is specifying your RSS feed with longitude and latitude elements on each item to have it embedded in the map. I asked why unlike Google Maps and MSN Virtual Earth, Yahoo! Maps doesn't allow users to host the maps on their page nor does there seem to be an avenue for revenue sharing with mash-up authors via syndicated advertising. The response I got was that they polled various developers and there wasn't significant interest in embedding the maps on developer's sites especially when this would require paying for hosting.

We then got showed a number mapping mashups including a mashup of the London bombings which used Google Maps, Flickr & RSS feeds of news (the presenter had the poor taste to point out opportunities to place ads on the site), a mashup from alkemis which mashes Google Maps, A9.com street level photos and traffic cams, and a mash-up from Analygis which integrates census data with Google Maps data.

The following items were then listed as the critical components of mash-ups
- AJAX (Jeffrey McManus said it isn't key but a few of the guys on the panel felt that at least dynamic UIs are better)
- APIs
- Advertising
- Payment
- Identity/Acct mgmt
- Mapping Services
- Content Hosting
- Other?

On the topic of identity and account management, the problem of how mash-ups handle user passwords came up as a problem. If a website is password protected then user's often have to enter their usernames and passwords into third party sites. An example of this was the fact that PayPal used to store lots of username/password information of eBay users which caused the company some consternation since eBay went through a lot of trouble to protect their sensitive data only to have a lot of it being stored on Paypal servers.

eBay's current solution is similar to that used by Microsoft Passport in that applications are expected to have user's login via the eBay website then the user is redirected to the originating website with a ticket indicating they have been authenticated. I pointed out that although this works fine for websites, it offers no solution for people trying to build desktop applications that are not browser based. The response I got indicated that eBay hasn't solved this problem.

My main comment about this panel is that it didn't meet expectations. I'd expected to hear a discussion about turning mashups [and maybe the web platforms they are built on] into money making businesses. What I got was a show-and-tell of various mapping mashups. Disappointing.

Categories: Trip Report

October 6, 2005

@ 02:19 PM

Web 2.0 Conference Trip Report: AttentionTrust - First Board Meeting and Discusssion

I attended the first board meeting of the AttentionTrust which was open to all and hosted by Steve Gillmor, Hank Barry and Seth Goldstein.

Steve Gillmor began by talking about the history of the organization and why it got started. As has been stated previously the core goal of the organization is that attention data - that is, data that describes what you're paying attention to - has value, and because it has value, when you give someone your attention you should expect to be given something in return. And just because you give someone your attention, it doesn't mean that they own it. You should expect to get it back.

Seth Goldstein mentioned that they are now officially a non-profit. Seth also mentioned that there is now a post on their group blog that goes into some detail to clarify their intentions. They have worked with the developer of the Outfoxed Firefox extension (Stan James) to build an Attention Recorder plugin for Firefox. This plugin records a user's attention data to their hard drive with the assumption being that eventually there will be AttentionTrust certified companies that users can trade this information with.

Stan James gave a demo of the Attention Recorder plugin and stated that it has two main features

The toolbar button which lights up if you are on an AttentionTrust certified site.
The Attention Recorder logs all a user's web traffic to a particular website. Sites can be excluded from being logged so one doesn't accidentally log access to sensitive websites. One can also configure the sites to which the clickstream logs should be sent.

After Stan's demo, the rest of the session turned into a Q & A session. Below are paraphrased questions and answers.

Q: Why would users share their attention data without getting paid? Is the promise of killer apps that can harness the user's attention data enough for users?

A: The value/price of a user's attention data is up to agreements between the users and companies.

Q: What about legacy attention data?

A: The AttentionTrust has partnered with companies that have some attention data today such as Rojo to see if they can expose it in AttentionTrust compliant ways.

There was a brief interlude at this point where Stan James went over some of the implemention details of the Attention Recorder as well as showed examples of the XML format it uses for logging a user's clickstream. On seeing the XML format, Dave Sifry who was in the audience brought up also brought up the point of ensuring that sensitive data such as usernames and passwords aren't being sent to services.

Greg Yardley is a part of a lead-generation company building a service to match up users to businesses using attention clickstream data. He's been making sure to follow the principles of AttentionTrust while building their application. For example, users can delete their attention data if needed. Steve Gillmor asked Greg what kinds of apps people could build if they had more access to user's attention data. Greg responded with a lot of examples such as more accurate personalized search engines, searching only over websites the user has seen recently, more accurate data for dating sites to use in matching people up, RSS readers that know to mark items as read if you read them from your browser and a number of other interesting ideas.

A member of the audience asked how AttentionTrust compliant lead-generation companies could be marketed as being better than their traditional alternatives that used slimy methods. The response from Seth Goldstein was that leads generated from attention data would be of higher quality (e.g. leads for mortgage customers generated from people searching for "refinance" are better than leads from people signing up to receive free iPods). Another member of the audience disagreed with Seth and pointed out that it isn't so cut and dried. She pointed out that an unemployed college student could spend their days surfing shopping sites for luxury goods but that doesn't make them a good lead for companies trying to sell luxury goods.

Another user asked what ways exist to convince users to choose AttentionTrust companies. Seth said people will build cool apps about themselves based on local data is probably the key. I jumped into the discussion and used Amazon as an example of end users giving a company their attention data on music and books they like either implicitly (by buying them) or explicitly (by rating them). My question was how could AttentionTrust convince Amazon to open up all their attention data. Steve Gillmor replied that it isn't likely that they'd be able to convince the incumbents to follow the principles of AttentionTrust but if enough small players got together and started building some of these cool apps then great things could happen.

I believe there was also a positive comparison to Richard Stallman and the Free Software Movement but I've forgotten the exact details.

Categories: Trip Report

October 5, 2005

@ 09:18 PM

Web 2.0 Conference Trip Report: Open Source Infrastructure

I attended the panel on Open Source Infrastructure hosted by Marc Canter, Tantek Çelik,
Brian Dear, Matt Mullenweg and Toni Schneider.

Marc Canter coined the term "Open Source infrastructure" while pitching OurMedia to big companies while seeking funding. He pointed out that in the Web 2.0 world we are in danger of swapping the lock-in of desktop platforms controlled by big companies like Apple and Microsoft for lock-in of Web platforms controlled by big companies like eBay and Amazon. The same way we have open source platforms to prevent desktop lock-in, we need open source Web infrastructure to prevent platform lock-in.

Brian Dear of EVDB is working on making event publishing on the Web easier. Eventful is a website that aggregates events. The website is built on the EVDB API which is itself built on an the EVDB index over the EVDB Database. This same API is available for third parties to build applications against. Brian divides events into high definition and low definition events. Low definition events are easy to create and have simple metadata usually just a title and start/end time for the event. However simple events are hard to discover due to the lack of structured metadata. On the other hand, high definition events have lots of fields (title, start/end time, description, price, recurrence, etc) which makes them harder to create but easier to discover by applications. They have created SES (simple event sharing) which is a mechanism for web sites to ping an event server with changes similar to how weblogs currently ping places like Weblogs.com and blo.gs when they are updated.

At this point Marc Canter interjects and asks where the events are located. Will they be able to suck up events from sites like Craig's List or will they have to be on Eventful? Brian Dear states that he prefers the model where aggregators of events such as Eventful points to existing sites such as Craig's List especially since they are not metadata rich (i.e. low definition events). Marc then points to someone in the audience who has a similar site and asks whether they have a ping server model and he mentions that they crawl the Web instead.

This segued into Matt Mullenweg talking about ping servers. Matt talked about ping-o-matic which aggregates ping servers so blogs can just ping it and it pings the dozens of ping servers out there instead. However they have significant scaling issues with some days where they get up to 4,000,000 pings a day. Unsurprisingly, a lot of the pings turn out to be spam. Matt has asked help from various sources and has gotten servers from Technorati. . Marc asks whether pings can grow beyond blogs to events, group creation and other types of pings. Although Matt seemed hesitant, Brian points out that they have extended the ping format already from what sites like Weblogs.com uses to accomodate their needs for events.

Yahoo! just bought upcoming which is an event site and Tony Schneider [who used to be at Oddpost] was representing Yahoo! on the panel. Tony believes that big companies shouldn't own the core services on the Web which is one of the motivations for Yahoo! opening up their APIs. The biggest develop communities around Yahoo!'s offerings come from the entities they have purchased such as Flickr & Konfabulator. Turning their services into developer platforms is now big at Yahoo!. Another thing that is really big at Yahoo! is using RSS. Yahoo! doesn't care that it isn't an official standard produced by a standard's body. It gets the job done. They are also acquiring core online services like blo.gs and keeping them open to benefit the developer world. Marc asks when they decide when working with others (e.g. MediaRSS) versus buying companies (blo.gs and upcoming). Todd replies that with formats they first look at whether anything exists before creating something new (e.g. GeoRSS used in Yahoo! Maps API vs. MediaRSS which they created).

Tantek talked about Web 2.0 themes ('you control your own data', 'mix and match data', 'open, interoperable and web friendly data formats & protocols'). Marc points out that the Web is not the end all and be all so 'Web friendly' is cool but not overriding. Tantek also discussed Talked about microformat design principles and the various themes within the microformats community (open source tools, open communications and focus on concrete scenarios). He then briefly talked about two microformats. The first was hReview which is a format for reviews which got input from folks at Yahoo!, SixApart and MSN among others. there are a lot of websites for reviews (Amazon, Yahoo! Local) but no real standard. The second was hReview which is now being used by http://www.avon.com to mark up the contact information for over 40,000 Avon representatives. Tantek also shows that you can mark up the Web 2.0 speakers list and then he wrote a bookmarklet that can suck up all the speakers into his address book.

During the Q & A session I asked three questions

Isn't the focus on centralized services for pinging worrying for services with lots of users because it is quite possible for us to overwhelm the services with our user base? Matt responded that this is why he was seeking donations from large companies.
Currently microformats seem to be going fine since Tantek is the main guy creating them but what happens when others start creating them and unlike XML which has namespaces there is no way to disambiguate them? Tantek responded that there were already many people creating microformats. He also stated that the issue of duplicate or redundant microformats would be dealt with via the community process.
Isn't the big problem with the lack of adoption of standards for creating events, the lack of a killer app for consuming events? Tantek responded that the killer apps may be here already by showing how he wrote an app to consume hCalendar events and place them in his iCal calendar. Brian mentioned that Eventful uses hCalendar and hCard.

In general, I'd have preferred if the panel was more of a discussion as opposed to an hour or more of sales pitches from the panelists with 10 minutes of Q & A at the end. I have a feeling that a lot more knowledge would have been gained by having members of the audience riff along with the panelists instead of the traditional "wisdom from above" model used in thuis panel.

An additional meta-comment about the conference is that so far I've been unable to get into 2 out of the 3 sessions I wanted to attend this morning because they were filled to overflowing. Given how much folks are paying for the conference and how much experience the O'Reilly folks have with holding conferences, one wouldn't see such problems occur.

Categories: Trip Report

August 18, 2005

@ 02:55 PM

Nigeria 2005 Trip: Photos Uploaded

For the few folks that have asked, I have uploaded 50 Photos from my Nigeria Trip to the photo album in my MSN Space. The photos are from all over Nigeria specifically Lagos, Abuja and Abeokuta. They are somewhat crappy since I used a disposable camera I bought at the airport but they do capture the spirit of the vacation.

I guess I should start thinking about investing in a digital camera.

Update: Even though no one asked I've also decided to start rotating a song of the week on my space from the rap group I formed with Big Lo and a couple of other guys back in high school. Just click on the play button on the Windows Media player module to hear this week's track.

Categories: Trip Report

August 11, 2005

@ 08:48 AM

Comments [7]

Nigeria Trip 2005: The Voyage Home

I'm on the way back from my trip and this is the part of the vacation that sucks. It's going to take a total of 5 flights to get from Abuja back to Seattle as well as about half a day of sitting around in airports as well. Below are a bunch of last minute impressions about Nigeria and London (where I'm currently posting this from).

All the male restrooms in Heathrow airport have condoms dispensers. This really has me scratching my head since the only place I usually see them is in night club restrooms which makes sense since a bunch of hooking up goes on at night clubs. So now I have this impression that somewhere in Heathrow there is a bunch of debauchery going on and I'm not a part of it. It must be the first class lounges...
If you ask a British bartender for a 'Long Island Iced Tea', don't be surprised if he responds "We don't serve tea at the bar, twit!"
It seems I've picked up homophobia by osmosis while in the United States. I kept finding it weird that men could be seen holding hands together either for emphasis in a conversation or while walking without being seen as 'gay' in Nigeria. Similarly having guys sleep in the same bed also gave me a similar vibe. I can't believe I'm getting culture shock from my home country.
Do you know who cleans the streets of Lagos & Abuja? The street sweepers, literally. I was freaked out to see people with brooms sweeping the sides of the roads in both Abuja and Lagos without the luxury of safety cones. My memory fails me as to whether this is an improvement from not having street sweepers from a few years ago or this was just the status quo.
Soft drinks sold in plastic bottles seems to be gaining popularity in Lagos & Abuja. Back in the day it was all about the glass bottles, which were always redeemed by people. In fact, the price of a bottle of beer or a soft drink always assumed you'd be returning a bottle as well. It took me a while to get used to the 'wastefulness' in the United States where people just threw away the bottles. Of course, there were other places where the wastefulness surprised me as well when I first got here such as using paper towels instead of wash rags or styrofoam silverware & plates instead of reusable plastic ones at fast food places. Now it's the other way around. After doing the dishes at my mom's I was confused to not find paper towels nearby. I am becoming so American...
Thanks to a ban on external imports of various consumer goods we now get Heineken and Five Alive brewed locally. Awesome!!!

Categories: Trip Report

August 7, 2005

@ 05:16 PM

Nigeria 2005 Trip: Week 2

I've been doing a bit more travelling around the country this week. The travel high point was a trip by helicopter today to a number of places including a local chief's palace and the village my dad where my dad was born. I took a couple of pics from the helicopter as well as on the ground and hope at least a few of them come out OK.

Below are a couple more random thoughts that have crossed my mind during this trip since my previous post

The proliferation of mobile phones is even more significant than I thought. I had assumed it was a city thing since the phones I saw folks with were in Abuja (current capital) and Lagos (former capital). However visiting less developed areas also have shown a high proliferation of mobile technology. In my dad's village I saw both a pay-as-you-go booth for MTN, a local mobile service provider, as well as a kiosk where a enterprising local entrepeneurs were renting out uses of their phones at 20 naira a call (about $0.15)
When I was growing up it was common practice for local businessmen to sell products that had been unsafe for public use in developed countries. It seems we now have a new government body called NAFDAC whose job is to act as the Nigerian version of the FDA. NAFDAC has been so effective that there have been multiple attempts on the life of the head of the organization by pissed off business owners whose products she's taken off the market.
The only thing scarier than being in a speeding car in typical Lagos or Abuja traffic is being driven in a speeding car in Lagos or Abuja traffic with an in-dashboard DVD player which is showing hip hop videos with half naked chicks dancing seductively. I kept wondering if the driver could keep his eyes on the road. That's it. Next time I come here, I'm walking everywhere.
As I expected the common questions from family and extended family were when I'm going to show up with a future spouse and when I'm going back to school. What I didn't expect was so many people asking when I became such a fat ass. In hindsight, I should have expected it given that I haven't seen some of these folks in almost a decade and I've put on dozens of pounds since then. I definitely need to get back in shape.

Categories: Trip Report

August 1, 2005

@ 07:03 PM

Nigeria 2005 Trip: Week 1

I've been in Nigeria for almost a week and so far it's been great. I've spent a bunch of time with family and friends, eaten a bunch of stuff I haven't hafd in years and decided I like MTV in Africa better than what we get in the United States. I've also been taking pictures of everyday life which I'll post to the photo album on my Space once I get back.

Below is a random grab bag of impressions I've had during my trip

The traffic scares me. A lot. When being driven in Lagos & Abuja I tend to clench my fists while expecting we'll be in an accident at any minute. I can't get over the fact that as a teenager I used to be able to drive in this chaos and never had an accident. :)
The proliferation of mobile phones is insane. There seem to be about half a dozen mobile phone carriers and almost everyone on the streets is carrying one. I was talking to my dad and he said the Nigerian mobile phone market is the second fastest growing in the world after to China. About two years ago when I was last here I saw more people downloading ringtones and texting than I'd seen in Seattle & Redmond, the trend has only continued. I have a bunch of pics of mobile phone ads on the sides of buildings and street hawkers selling pay-as-you-go recharge cards which I'll post once I get back.
There is now a large local movie & hip hop scene. The movie scene was blowing up just before I left for college but it now seems to have matured quite a bit. It seems we export movies all over Africa. Folks have started calling the Nigerian movie scene "Nollywood". There are also a ton of local hip hop acts including one of my high school friends is now a rapper called Big Lo. About a decade ago he and I were part of a rap group called R.I.G. and I still have some of our tracks on my iPod. It's great to see that at least one of us is living our teenage dream of being a famous rap star.
The newspaper headlines seem to focus exclusively on the goings on of the government & politicians or on tragedies involving loss of life. The contrast between that and the kind of stuff I usually see on the cover of USA Today is striking.
Inflation is crazy especially in Abuja. Everything seems to cost a couple of hundred or thousands of naira. I still remember when you could get a bottle of Coke or a newspaper for under one naira. Then again, that was about two decades ago.
There are a lot of billboards about HIV/AIDS prevention in the capital city in Abuja but almost none in Lagos (the former capital and commercial center). I'll try and get some pics of the billboards before I get back.
Almost every PC I've used so far has been infested with spyware. Except for the Powerbook...
The London bombings are on people's minds in my social circle. One of my mom's friends lost her only soon in the July 7th attacks. My sister and dad were in London during the first bombing and I was pretty rattled when it happened. It's good to see the British police have caught all the suspects from the second attack.
The local airline business seems to be thriving as well. Here's another place where there seems to be at least half a dozen competitors driving down prices. It looks like the government airline, Nigeria Airways, is finally out of commission. Good riddance.
I miss Nigerian food.

Categories: Trip Report

June 26, 2005

@ 09:31 PM

Gnomedex 5.0 Trip Report: Mark Fletcher, Scott Rafer, Bob Wyman on Tomorrow's RSS

I missed the first few minutes of this talk.

Bob Wyman of PubSub stated he believed Atom was the future of syndication. Other formats would eventually be legacy formats that would be analogous to RTF in the word processing world. They will be supported but rarely chosen for new efforts in the future.

Mark Fletcher of Bloglines then interjected and pleaded with the audience to stop the practice of providing the same feed in multiple formats. Bob Wyman agreed with his plea and also encouraged members of the audience to pick one format and stick to it. Having the same feed in multiple syndication formats confuses end users who are trying to subscribe to the feed and leads to duplicate items showing up in search engines that specialize in syndication formats like PubSub, Feedster or the Bloglines search features.

A member of the audience responded that he used multiple formats because different aggregators support some formats better than others. Bob Wyman replied that bugs in aggregators should result in putting pressure on RSS aggregator developers to fix them instead of causing confusion to end users by spitting multiple versions of the same feed. Bob then advocated using picking Atom since a lot of lessons had been learned via the IETF process to improve the format. Another audience member mentioned that 95% of his syndication traffic was for his RSS feed not his Atom feed so he knows which format is winning in the market place.

A question was raised about whether the admonition to avoid multiple versions of feed also included sites that have multiple feeds for separate categories of content. The specific example was having a regular feed and a podcast feed. Bob Wyman thought that this was not a problem. The problem was the same content served in different formats.

The discussion then switched to ads in feeds. Scott Rafer of Feedster said that he agreed with Microsoft's presentation from the previous day that Subscribing is a new paradigm that has come after Browsing and Searching for content. Although we have figured out how to provide ads to support Browse & Search scenarios we are still experimenting with how to provide ads to support the Subscribe scenarios. Some sites like the New York Times uses RSS to draw people to its website by providing excerpts in its feeds. Certain consultants have full text feeds which they view as advertising their services. While others put ads in their feeds. Bob Wyman mentioned that PubSub is waiting to see which mechanism the market settles on for having advertising in feeds before deciding on approach. Bob Wyman added that finding a model for advertising and syndication was imperative so that intermediary services like PubSub, Feedster and Bloglines can continue to exist. An audience member then followed up and asked why these services couldn't survive by providing free services to the general public and charging corporate users instead of resorting to advertising. The response was that both PubSub and Feedster already have corporate customers who they charge for their services but this revenue is not be enough for them to continue providing services to the general public. The Bloglines team considered having fee-based services but discarded the idea because they felt it would be a death-knell for the service given that most service providers on the Web are free not fee-based.

An audience member asked if any of the services would have done anything different two years ago when they started given the knowledge they had now. The answers were that Feedster would have chosen a different back-end architecture, Bloglines would have picked a better name and PubSub would have started a few months to a year sooner.

I asked the speakers what features they felt were either missing in RSS or not being exploited. Mark Fletcher said that he would like to see more usage of the various comment related extensions to RSS which currently aren't supported by Bloglines because they aren't in widespread use. The other speakers mentioned that they will support whatever the market decides is of value.

Categories: Syndication Technology | Trip Report

June 26, 2005

@ 05:53 PM

Gnomedex 5.0 Trip Report: Steve Gillmor, Dave Sifry, Scott Gatz on Tomorrow's Syndication

Scott Gatz of Yahoo! started by pointing out that there are myriad uses for RSS. For this reason he felt that we need more flexible user experiences for RSS that map to these various uses. For example, a filmstrip view is more appropriate for reading a feed of photos than a traditional blog and news based user interface typically favored by RSS readers. Yahoo! is definitely thinking about RSS beyond just blogs and news which is why they've been working on Yahoo! Media RSS which is an extension to RSS that makes it better at syndicating digital media content like music and videos. Another aspect of syndication Yahoo! believes is key is the ability to keep people informed about updates independent of where they are or what device they are using. This is one of the reasons Yahoo! purchased the blo.gs service.

Dave Sifry of Technorati stated that he believed the library model of the Web where we talk about documents, directories and so on is outdated. The Web is more like a river or stream of millions of state changes. He then mentioned that some trends to watch that emphasized the changing model of the Web were microformats and tagging.

BEGIN "Talking About Myself in the Third Person"

Steve Gillmor of ZDNet began by pointing out Dare Obasanjo in the audience and saying that Dare was his hero and someone he admired for the work he'd done in the syndication space. Steve then asked why in a recent blog posting Dare had mentioned that he would not support Bloglines proprietary API for synchronizing a user's subscriptions with a desktop RSS reader but then went on to mention that he would support Newsgator Online's proprietary API. Specifically he wondered why Dare wouldn't work towards a standard instead of supporting proprietary APIs.

At this point Dare joined the three speakers on stage.

Dare mentioned that from his perspective there were two major problems that confronted users of an RSS reader. The first was that users eventually need to be able to read their subscriptions from multiple computers. This is because many people have multiple computers (e.g. home & work or home & school) where they read news and blogs from. The second problem is that eventually, due to the ease of subscribing to feeds, people eventually succumb to information overload and need a way to see only the most important or interesting content in the feeds to which they are subscribed. This is the "attention problem" that Steve Gillmor is a strong advocate of solving. The issue discussed in Dare's blog post is the former not the latter. The reason for working with the proprietary APIs provided by online RSS readers instead of advocating a standard is that the online RSS readers are the ones in control. At the end of the day, they are the ones that provide the API so they are the ones that have to decide whether they will create a standard or not.

Dare rejoined the audience after speaking.

END "Talking About Myself in the Third Person"

Dave Sifry followed up by encouraging cooperation between vendors to solve the various problems facing users. He gave an example of Yahoo! working with Marc Canter on digital media as an example.

Steve Gillmor then asked audience members to raise their hand if they felt that the ability to read their subscriptions from multiple computers was a problem they wanted solved. Most of the audience raised their hands in response.

A member of the audience responded to the show of hands by advocating that people us web based RSS readers like Bloglines. Scott Gatz agreed that using a web based aggregator was the best way to access one's subscriptions from multiple computers. There is some disagreement between members of the audience and the speakers whether there are problems using Bloglines from mobile devices which prevent it from being the solution to this problem.

From the audience, Dave Winer asks Dave Sifry why Technorati invented Attention.Xml instead of reusing OPML. The response was that the problem was beyond just synchronizing the list of feeds the user is subscribed to.

Steve Gillmor ended the session by pointing out that once RSS usage becomes widespread someone will have to solve the problem once and for all.

Categories: Syndication Technology | Trip Report

June 25, 2005

@ 04:14 AM

Gnomedex 5.0 Trip Report: Dean Hachamovitch on Longhorn, IE 7 and RSS

This was a keynote talk given by Dean Hachamovitch and Amar Gandhi that revealed the the RSS platform that will exist in Longhorn, the level of RSS support in Internet Explorer 7 as well as showed some RSS extensions that Microsoft is proposing.

Dean started by talking about Microsoft's history with syndication. In 1997, there was Active Desktop and channels in IE 4.0 & IE 5.0 which wasn't really successful. We retreated from the world of syndication for a while after that. Then In 2002, Don Box starts blogging on GotDotNet. In 2003, we hired Robert Scoble. In 2004, Channel 9 was launched. Today we have RSS feeds coming out of lots of places from Microsoft. This includes the various feeds on the 14 15 million blogs on MSN Spaces, the 1500 employee blogs on http://blogs.msdn.com and http://blogs.technet.com, 100s of feeds on the Microsoft website and even MSN Search which provides RSS feeds for search results.

Using XML syndication is an evolution in the way people interact with content on the web. The first phase was browsing the Web for content using a web browser. Then came searching the Web for content using search engines. And now we have subscribing content using aggregators. Each step hasn't replaced the latter but instead has enhanced user experience while using the Web. In Longhorn, Microsoft is betting big on RSS both for end users and for developers in three key ways

Throughout Windows various experiences will be RSS-enabled and will be easy for end users to consume
An RSS platform will be provided that makes it easy for developers to RSS-enable various scenarios and applications
Increasing the number of scenarios that RSS handles by proposing extensions

Amar then demoed the RSS integration in Internet Explorer 7. Whenever Internet Explorer encounters an RSS feed, a button in the browser chrome lights up which indicates that a feed is available. Clicking on the button shows a user friendly version of the feed that provides rich search, filtering and sorting capabilities. The user can then hit a '+' button and subscribe to the feed. Amar then navigated to http://search.msn.com and searched for "Gnomedex 5.0". Once he got to the search results, the RSS button lit up and he subscribed to the search results. This shows how one possible workflow for keeping abreast of news of interest using the RSS capabilities of Internet Explorer 7 and MSN Search.

At this point Amar segued to talk about the Common RSS Feed List. This is a central list of feeds that a user is subscribed to that is accessible to all applications not just Internet Explorer 7. Amar then showed a demo of an altered version of RSS Bandit which used the Common RSS Feed List and could pick up both feeds he'd subscribed to during the previous demo in Internet Explorer 7. I got a shout out from Amar at this point and some applause from the audience for helping with the demo. :)

Dean then started to talk about the power of the enclosure element in RSS 2.0. What is great about it is that it enables one to syndicate all sorts of digital content. One can syndicate video, music, calendar events, contacts, photos and so on using RSS due to the flexibility of enclosures.

Amar then showed a demo using Outlook 2003 and an RSS feed of the Gnomedex schedule he had created. The RSS feed had an item for each event on the schedule and each item had an iCalendar file as an enclosure. Amar had written a 200 line C# program that subscribed to this feed then inserted the events into his Outlook calendar so he could overlay his personal schedule with the Gnomedex schedule. The point of this demo was to show that RSS isn't just for aggregators subscribing to blogs and news sites.

Finally, Dean talked about syndicating lists of content. Today lots of people syndicate Top 10 lists, ToDo lists, music playlists and so on. However RSS is limited in how it can describe the semantics of a rotating list. Specifically the user experience when the list changes such as when a song in a top 10 list leaves the list or moves to another position is pretty bad. I discussed this very issue in a blog post from a few months ago entitled The Netflix Problem: Syndicating Ordered Lists in RSS.

Microsoft has proposed some extensions to RSS 2.0 that allows RSS readers deal with ordered lists better. A demo was shown that used data from the Amazon Web Services to create an RSS feed of an Amazon wish list (the data was converted to RSS feeds with the help of Jeff Barr). The RSS extensions provided information that enabled the demo application to know which fields to use for sorting and/or grouping the items in the wish list feed.

The Microsoft Simple List Extensions Specification is available on MSDN. In the spirit of the RSS 2.0 specification, the specification is available under the Creative Commons Attribution-ShareAlike License (version 2.5).

A video was then shown of Lawrence Lessig where he commended Microsoft for using a Creative Commons license.

The following is a paraphrasing of the question & answer session after their talk

Q: What syndication formats are supported?
A: The primary flavors of RSS such as RSS 0.91, RSS 1.0 and RSS 2.0 as well as the most recent version of Atom.

Q: How granular is the common feed list?
A: The Longhorn RSS object model models all the data within the RSS specification including some additional metadata. However it is fairly simple with 3 primary classes.

Q: Will Internet Explorer 7 support podcasting?
A: The RSS platform in Longhorn will support downloading enclosures.

Q: What is the community process for working on the specifications?
A: An email address for providing feedback will be posted on the IE Blog. Robert Scoble also plans to create a wiki page on Channel 9.

Q: What parts of the presentation are in IE 7 (and thus will show up in Windows XP) and what parts are in Longhorn?
A: The RSS features of Internet Explorer 7 such as autodiscovery and the Common RSS Feed List will work in Windows XP. It is unclear whether other pieces such as the RSS Platform Sync Engine will make it to Windows XP.

Q: What are other Microsoft products such as Windows Media Player doing to take advantage of the RSS platform?
A: The RSS platform team is having conversation with other teams at Microsoft to see how they can take advantage of the platform.

Categories: Syndication Technology | Trip Report

March 18, 2005

@ 07:21 PM

ETech 2005 Trip Report: Odeo -- Podcasting for Everyone

These are my notes from the Odeo -- Podcasting for Everyone session by Evan Williams.

Evan Williams was the founder of Blogger and Odeo is his new venture. Just as in his post How Odeo Happened Evan likens podcasting to audioblogging and jokingly states that he and Noah Glass invented podcasting with AudioBlogger. Of course, the audience knew he was joking and laughed accordingly. I do wonder though, how many people think that podcasting is simply audioblogging instead of realizing that the true innovation is the time shifting of digital media to the user's player of choice.

The Odeo interface has three buttons across the top; Listen, Sync and Create. Users can choose to listen to a podcast from a directory of podcasts on the site directly from the Web page. They can choose to sync podcasts from the directory down to their iPod using a Web download tool which also creates Odeo specific playlists in iTunes.

The Odeo directory also contains podcasts that were not created on the site so they can be streamed to users. If third parties would rather not have their podcasts hosted on Odeo they can ask for them to be taken down.

The Create feature was most interesting. The website allows users to record audio directly on the website without needing any desktop software. This functionality seems to be built with Flash. Users can also save audio or upload MP3s from their hard drive which can then be spliced into their audio recordings. However one cannot mix multiple audio tracks at once (i.e. I can't create an audio post then add in background music later, I can only append new audio).

The revenue model for the site will most likely be by providing hosting and creating services that allow people to charge for access to their podcasts. There was some discussion on hosting music but Evan pointed out that there were already several music sites on the Web andd they didn't want to be yet another one.

Odeo will likely be launching in a few weeks but will be invitation-only at first.

Categories: Syndication Technology | Trip Report

March 18, 2005

@ 06:20 PM

ETech 2005 Trip Report: Introducing Google Code

This was a late breaking session that was announced shortly after The Long Tail: Conversation with Chris Anderson and Joe Kraus. Unfortunately, I didn't take my tablet PC with me to the long tail session so I don't have any notes from it. Anyway, back to Google Code.

The Google Code session was hosted by Chris DiBona. The Google Code homepage is is similar to YSDN in that it tries to put all the Google APIs under a single roof. The site consists of three main parts; information on Google APIs, links to projects Open Sourced by Google that are hosted on SourceForge and highlighted projects created by third parties that use Google's APIs.

The projects Open Sourced by Google are primarily test tools and data structures used internally. They are hosted on SourceForge although there seemed to be some dislike for the features of the site both from Chris and members of the audience. Chris did feel that among the various Open Source project hosting sites existing today, SourceForge was the one most likely to be around in 10 years. He mentioned that Google was ready to devote some resources to helping the SourceForge team improve their service.

Categories: Technology | Trip Report

March 18, 2005

@ 05:37 PM

ETech 2005 Trip Report: Introduction to Yahoo! Search Web Services

These are my notes on Introduction to Yahoo! Search Web Services session by Jeremy D. Zawodny

The Yahoo! Search web services are available on the Yahoo! Search Developer Network(YSDN) site. YSDN launched by providing web services that allow applications to interact with local, news, Web, image and video search. Web services for interacting with Y!Q contextual search was launched during ETech.

Jeremy stated that the design goal for their web services was for them to be simple and have a low barrier to entry. It was hoped that this would help foster a community and create two-way communication between the Yahoo! Search team and developers. To help foster this communication with developers YSDN provides documentation, an SDK, a blog, mailing lists and a wiki.

Requests coming in from client applications are processed by an XML proxy which then sends the queries to the internal Yahoo! servers and returns the results to developers. The XML proxy is written in PHP and is indicative of the trend to move all new development at Yahoo! to PHP.

Some of the challenges in building YSDN were deciding what communications features (wiki vs. mailing list), figuring licencing issues, and quotas on methods calls (currently 5,000 calls per day per IP address). In talking to Jeremy after the talk I pointed out that rate limiting by IP penalizes applications used behind a proxy server that make several requests a day such as RSS Bandit being used by Microsoft employees at work. There was a brief discussion about alternate approaches to identifying applications such as cookies or using a machine's MAC address but these all seemed to have issues.

Speaking of RSS, Jeremy mentioned that after they had implemented their Web services which returned a custom document format he realized that many people would want to be able to transform those results to RSS and subscribe to them. So he spoke to the developer responsible and he had RSS output working within 30 minutes. When asked why they didn't just use RSS as their output format instead of coming up with a new format, he responded that they didn't want to extend RSS but instead came up with their own format. Adam Bosworth mentioned afterwards that he thought that it was more disruptive to create a new format instead of reusing RSS and adding one or two extensions to meet their needs.

Then there was the inevitable REST vs. SOAP discussion. Yahoo! picked REST for their APIs because of its simplicity and low barrier to entry for developers on any platform. Jeremy said that the tipping point for him was when he attended a previous O'Reilly conference and Jeff Barr from Amazon stated that 20% of their API traffic was from SOAP requests but they accounted for 80% of their support calls.

Jeremy ended the talk by showing some sample applications that had been built on the Yahoo! Search web services and suggesting some ideas for members of the audience to try out on their own.

Categories: Trip Report | XML Web Services

March 17, 2005

@ 05:17 PM

ETech 2005 Trip Report: "Just" Use HTTP

These are my notes from the "Just" Use HTTP session by Sam Ruby

The slides for this presentation are available. No summary can do proper justice to this presentation so I'd suggest viewing the slides.

Sam's talk focuses on the various gotchas facing developers building applications using REST or Plain old XML over HTTP (POX). The top issues include unicode (both in URIs and XML), escaped HTML in XML content and QNames in XML content. A lot of these gotchas are due to specs containing inconsistencies with other specs or in some cases flat out contradictions. Sam felt that there is an onus on spec writers to accept the responsibility that they are responsible for interop and act accordingly.

At the end of the talk Sam suggested that people doing REST/POX would probably be better of using SOAP since toolkits took care of such issues for them. I found this amusing given that the previous talk was by Nelson Minar saying the exact opposite and suggesting that some people using SOAP should probably look at REST.

The one thing I did get out of both talks is that there currently isn't any good guidance on when to use SOAP+WSDL vs. when to use REST or POX in the industry. I see that Joshua Allen has a post entitled The War is Over (WS-* vs. POX/HTTP) which is a good start but needs fleshing out. I'll probably look at putting pen to paper about this in a few months.

Categories: Trip Report | XML Web Services

March 17, 2005

@ 01:59 PM

ETech 2005 Trip Report: Building a New Web Service at Google

These are my notes on the Building a New Web Service at Google session by Nelson Minar

Nelson Minar gave a talk about the issues he encountered while shipping the Adwords API. The slides for the talk are available online. I found this talk the most useful of the conference given that within the next year or so I'll be going through the same thing at MSN.

The purpose of the Adwords API was to enable Google customers manage their adwords campaigns. In cases where users have large numbers of keywords, it begins to be difficult to manage an ad campaign using the Web interface that Google provides. The API exposes endpoints for campaign management, traffic estimation and reporting. Users have a quota on how many API calls they can make a month which is related to the size of their campaign. There are also some complex authentication requirements since certain customers give permission to third parties to manage their ad campaigns for them. Although the API has only been available for a couple of weeks there are already developers selling tools that built on the API.

The technologies used are SOAP, WSDL and SSL. The reason for using SOAP+WSDL was so that XML Web Service toolkits which perform object<->XML data binding could be used by client developers. Ideally developers would write code like

adwords = adwordsSvc.MakeProxy(...) adwords.setMaxKeywordCPC(53843, "flowers", "$8.43")

without needing to know or understand a lick of XML. Another benefit of SOAP were that it has a standard mechanism for sending metadata (SOAP headers) and errors (SOAP faults).

The two primary ways of using SOAP are rpc/encoded and document/literal. The former treats SOAP as a protocol for transporting programming language method calls just like XML-RPC while the latter treats SOAP as a protocol for sending typed XML documents. According to Nelson, the industry had finally gotten around to figuring out how to interop using rpc/encoded due to the efforts of the SOAP Builders list only for rpc/encoded to fall out of favor and document/literal to become the new hotness.

The problem with document/literal uses of SOAP is that it encourages using the full expressivity of W3C XML Schema Definition (XSD) language. This is in direct contradiction with trying to use SOAP+WSDL for object<->XML mapping since XSD has a large number of concepts that have no analog in object oriented programming.

Languages like Python and PHP have poor to non-existent support for either WSDL or document/literal encoding in SOAP. His scorecard for various toolkits in this regard was

Good: .NET, Java (Axis)
OK: C++ (gSOAP), PERL (SOAP::Lite)
Poor: Python (SOAPpy, ZSI), PHP (many options)

He also gave an example that showed how even things that seemed fundamentally simple such as specifying that an integer element had no value could cause interoperability problems in various SOAP toolkits. Given the following choices

<foo xsi:nil="true"/>
<foo/>
nothing
<foo>-1</foo>

The first fails in current version of the .NET framework since it maps ints to System.Int32 which is value type meaning it can't be null. The second is invalid according to the rules of XSD since an integer cannot be empty string. The third works in general. The fourth is ghetto but is the least likely to cause problems if your application is coded to treat -1 as meaning the value is non-existent.

There are a number of other issues Nelson encountered with SOAP toolkits including

Nested complex types cause problems
Polymorphic objects cause problems
Optional fields cause problems
Overloaded methods are forbidden.
xsi:type can cause breakage. Favor sending untyped documents instead.
WS-* is all over the map.
Document/literal support is weak in many languages.

Then came the REST vs. SOAP part of the discussion. To begin he defined what he called 'low REST' (i.e. Plain Old XML over HTTP or POX) and 'high REST'. Low REST implies using HTTP GETs for all API accesses but remembering that GET requests should not have side effects. High REST involves using the four main HTTP verbs (GET, POST, PUT, and DELETE) to manipulate resource representations, using XML documents as message payloads, putting metadata in HTTP headers and using URLs meaningfully.

Nelson also pointed out some limitations of REST from his perspective.

Development becomes burdensome if lots of interactivity in application (No one wants to write lots of XML parsing code)
PUT and DELETE are not implemented uniformly on all clients/web servers.
No standard application error mechanism (most REST and POX apps cook up their own XML error document format)
URLs have practical length limitations so one can't pass too muh data in a GET
No WSDL, hence no data binding tools

He noted that for complex data, the XML is what really matters which is the same if you are using REST or SOAP’s document/literal model. In addition he felt that for read-only APIs, REST was a good choice. After the talk I asked if he thought the Google search APIs should have been REST instead of SOAP and he responded that in hindsight that would have been a better decision. However he doesn't think there have been many complaints about the SOAP API for Google search. He definitely felt that there was a need for more REST tools as well as best practices.

He also mentioned things that went right including :

Switch to document/literal
Stateless design
Having a developer reference guide
Developer tokens
Thorough interop testing
Beta period
Batch methods (every method worked with a single item or an array which lead to 25x speed up for some customers in certain cases). Dealing with errors in the middle of a batch operation become problematic though.

There was also a list of things that went wrong

The switch to document/literal cost a lot of time
Lack of a common data model
Dates and timezones (they allowed users to specify a date to perform operations but since dates don't have time zones depending on when the user sent the request the results may look like they came from the previous or next day)
No gzip encoding
Having quotas caused customer confusion and anxiety
No developer sandbox which meant developers had to test against real data
Using SSL meant that debugging SOAP is difficult since XML on the wire is encrypted (perhaps WS-Security is the answer but so far implementations are spotty)
HTTP+SSL is much slower than just HTTP.
Using plain text passwords in methods meant that users couldn't just cut & paste SOAP traces to the forum which led to inadvertent password leaks.

This was an awesome talk and I definitely took home some lessons which I plan to share with others at work.

Categories: Trip Report | XML Web Services

March 17, 2005

@ 11:00 AM

ETech 2005 Trip Report: From the Labs: Google Labs

These are my notes on the From the Labs: Google Labs session by Peter Norvig, Ph.D.

Peter started off by pointing out that since Google hires Ph.D's in their regular developer positions they often end up treating their developers as researchers while treating their researchers as developers.

Google's goal is to organize the world's data. Google researchers aid in this goal by helping Google add more data sources to their search engines. They have grown from just searching HTML pages on the Web to searching video files and even desktop search. They are also pushing the envelop when it comes to improving user interfaces for searching such as with Google Maps.

Google Suggest which provides autocomplete for the Google search box was written by a Google developer (not a researcher) using his 20% time. The Google Personalized site allows users to edit a profile which is used to weight search results when displaying them to the user. The example he showed was searching for the term 'vector' and then moving the slider on the page to showing more personalized results. Since his profile showed an interest in programming, results related to vector classes in C++ and Java were re-sorted to the top of the search results. I've heard Robert Scoble mention that he'd like to see search engines open up APIs that allow users to tweak search parameters in this way. I'm sure he'd like to give this a whirl. Finally he showed Google Sets which was the first project to show up on the Google Labs site. I remember trying this out when it first showed up and thinking it was magic. The coolest thing to try out is to give it three movies starring the same and watch it fill out the rest.

Categories: Technology | Trip Report

March 17, 2005

@ 03:29 AM

ETech 2005 Trip Report: From the Labs: Yahoo! Research Labs

These are my notes on the From the Labs: Yahoo! Research Labs session by
Gary William Flake.

Yahoo! has two research sites, Yahoo! Next and Yahoo! Research Labs. The Yahoo! Next site has links to betas of products that will eventually become products such as Y!Q contextual search, a Firefox version of the Yahoo! toolbar and Yahoo! movie recommendations.

The Yahoo! research team focuses on a number of core research areas areas including machine learning, collective intelligence, and text mining. They publish papers frequently related to these topics.

Their current primary research project is the Tech Buzz Game which is a fantasy prediction market for high-tech products, concepts, and trends. This is in the same vein as other fantasy prediction markets such as the Hollywood Stock Exchange and the Iowa Electronics Market. The project is being worked in in collaboration with O'Reilly Publishing. A product's buzz is a function of the volume of search queries for that term. People who constantly predict correctly can win more virtual money which they can use to bet more. The name of this kind of market is a dynamic peri-mutuel market.

The speaker felt Tech Buzz would revolutionize the way auctions were done. This seemed to be a very bold claim given that I'd never heard of it. Then again it isn't like I'm auction geek.

Categories: Technology | Trip Report

March 17, 2005

@ 02:29 AM

ETech 2005 Trip Report: From the Labs: Microsoft Research

These are my notes on the From the Labs: Microsoft Research session by Richard F. Rashid, Ph.D.

Rick decided that in the spirit of ETech, he would focus on Microsoft Research projects that were unlikely to be productized in the conventional sense.

The first project he talked about was SenseCam. This could be considered by some to be the ultimate blogging tool. It records the user's experiences during the day by taking pictures, recording audio and even monitoring the temperature. There are some fancy tricks it has to do involving usage of internal motion detectors to determine when it is appropriate to take a picture so it doesn't end up blurry because the user was moving. There are currently 20 that have been built and clinical trials have begun to see if the SenseCam would be useful as aid to people with severe memory loss.

The second project he discussed was the surface computing project. The core idea around surface computing is turning everyday surfaces such as tabletops or walls into interactive input and/or display devices for computers. Projectors project displays on the surface and cameras detect when objects are placed on the surface which makes the display change accordingly. One video showed a bouncing ball projected on a table which recognized physical barriers such as the human hand when they were placed on the table. Physical objects placed on the table could also become digital objects. For example, placing a magazine on the table would make the computer copy it and when the magazine was removed a projected image of it would remain. This projected image of the magazine could then be interacted with such as by rotating and magnifying the image.

Finally he discussed how Microsoft Research was working with medical researchers looking for a cure for HIV infection. The primary problem with HIV is that it constantly mutates so the immune system and drugs cannot recognize all its forms to neutralize them in the body. This is similar to the spam problem where the rules for determining whether a piece of mail is junk mail keeps changing as spammers change their tactics. Anti-spam techniques have to use a number of pattern matching heuristics to figure out whether a piece of mail is spam or not. MSR is working with AIDS/HIV researchers to see whether such techniques couldn't be used to attack HIV in the human body.

Categories: Technology | Trip Report

March 16, 2005

@ 04:25 PM

ETech 2005 Trip Report: Vertical Search and A9

These are my notes in the Vertical Search and A9 by Jeff Bezos.

The core idea behind this talk was powerful yet simple.

Jeff Bezos started of by talking about vertical search. In certain cases, specialized search engines can provide better results than generic search engines. One example is searching Google for Vioxx and performing the same search on a medical search engine such as PubMed. The former returns results that are mainly about class action lawsuits while the latter returns links to various medical publications about Vioxx. For certain users, the Google results are what they are looking for and for others the PubMed results would be considered more relevant.

Currently at A9.com, they give users the ability to search both generic search engines like Google as well as vertical search engines. The choice of search engines is currently small but they'd like to see users have the choice of building a search homepage that could pull results from thousands of search engines. Users should be able to add any search engine they want to their A9 page and have those results display in A9 alongside Google or Amazon search results. To facilitate this, they now support displaying search results from any search engine that can provide search results as RSS. A number of search engines already do this such as MSN Search and Feedster. There are some extensions they have made to RSS to support providing search results in RSS feeds. From where I was standing some of the extension elements I saw include startIndex, resultsPerPage and totalResults.

Amazon is calling this initiative OpenSearch.

I was totally blown away by this talk when I attended it yesterday. This technology has lots of potential especially since it doesn't seem tied to Amazon in any way so MSN, Yahoo or Google could implement it as well. However there are a number of practical issues to consider. Most search engines make money from ads on their site so creating a mechanism where other sites can repurpose their results would run counter to their business model especially if this was being done by a commercial interest like Amazon.

Categories: Technology | Trip Report

March 16, 2005

@ 03:40 PM

ETech 2005 Trip Report: Remixing Technology at Applied Minds

These are my notes on the Remixing Technology at Applied Minds session by W. Daniel Hillis.

This was a presentation by one of the key folks at Applied Minds. It seems they dabble in everything from software to robots. There was an initial demo showing a small crawling robot where he explained that they discovered that six legged robots were more stable than those with four legs. Since this wasn't about software I lost interest for several minutes but did hear the audience clap once or twice for the gadgets he showed.

Towards the end the speaker started talking about an open market place of ideas. The specific scenario he described was the ability to pull up a map and have people's opinions of various places on the map show up overlayed on the map. Given that people are already providing these opinions on the Web today for free, there isn't a need to have to go through some licenced database of reviews to get this information. The ability to harness the collective consciousness of the World Wide Web in this manner was the promise of the Semantic Web which the speaker felt was going to be delivered. His talk reminded me a lot of the Committee of Gossips vision of the Semantic Web that Joshua Allen continues to evangelize.

It seems lots of smart people are getting the same ideas about what the Semantic Web should be. Unfortunately, they'll probably have to route around the W3C crowd if they ever want to realize this this vision.

Categories: Technology | Trip Report

March 15, 2005

@ 11:17 PM

ETech 2005 Trip Report: The App is the API: Building and Surviving Remixable Applications

These are my notes on the The App is the API: Building and Surviving Remixable Applications by Mike Shaver. I believe I heard it announced that the original speaker couldn't make it and the person who gave the talk was a stand in.

This was one of the 15 minute keynotes (aka high order bits). The talk was about Firefox and its extensibility model. Firefox has 3 main extensibility points; components, RDF data sources and XUL overlays.

Firefox components are similar to Microsoft's COM components. A component has a contract id which an analogous to a GUID in the COM world. Components can be MIME type handlers, URL scheme handlers, XUL application extensions (e.g. mouse gestures) or inline plugins (similar to ActiveX). The Firefox team is championing a new plugin model that is similar to ActiveX which is expected to be supported by Opera and Safari as well. User defined components can override built in components by claiming their contract id in a process which seemed to be fragile but the speaker claimed has worked well so far.

Although RDF is predominantly used as a storage format by both Thunderbird and Firefox, the speaker gave the impression that this decision was a mistake. He repeatedly stated that graph based data model was hard for developers to wrap their minds around and that it was too complex for their needs. He also pointed out that whenever RDF was criticized by them, advocates of the technology [and the Semantic Web] would claim tha there were future benefits that would be reaped from using RDF.

XUL overlays can be used to add toolbar buttons, tree widget columns and context menus to the Firefox user interface. They can also be used to create style changes in viewed pages as well. A popular XUL overlay is GreaseMonkey which the author showed could be used to add features to web sites such as persistent searches to GMail all using client side script. The speaker did warn that such overlays which applied style changes were inherently fragile since they depend on processing the HTML on the site which could change without warning if the site is redesigned. He also mentioned that it was unclear what the versioning model would be for such scripts once new versions of Firefox showed up.

Categories: Technology | Trip Report

March 15, 2005

@ 10:56 PM

ETech 2005 Trip Report: Web Services as a Strategy for Startups: Opening Up and Letting Go

These are my notes on the Web Services as a Strategy for Startups: Opening Up and Letting Go by Stewart Butterfield and Cal Henderson.

This was one of the 15 minute keynotes (aka high order bits). The talk was about Flickr and its API. I came towards the end so I missed the meat of the talk but it seemed the focus of it was showing the interesting applications people have built using the Flickr API. The speakers pointed out that having an API meant that cool features were being added to the site by third parties thus increasing the value and popularity of the site.

There were some interesting statistics such as the fact that their normal traffic over the API is 2.93 calls per second but can beup to 50-100 calls per second at its peak. They also estimate that about 5% of the website traffic are calls to the Flickr API.

Categories: Trip Report | XML Web Services

March 15, 2005

@ 08:40 PM