Seeing Jon Udell's post about having difficulty with the Google PR team with regards to discussing the Google GData API reminded me that I needed to write down some of my thoughts on extending RSS and Atom based on looking at GData. There are basically three approaches one can take when deciding to extend an XML syndication format such as RSS or Atom

  1. Add extension elements in a different namespace: This is the traditional approach to extending RSS and it involves adding new elements as children of the item or atom:entry element which carry application/context specific data beyond that provided by the RSS/Atom elements. Microsoft's Simple Sharing Extensions, Apple's iTunes RSS extensions, Yahoo's Media RSS extensions and Google's GData common elements all follow this model.

  2. Provide links to alternate documents/formats as payload: This approach involves providing links to additional data or metadata from an item in the feed. Podcasting is the canonical example of this technique. One argument for this approach is that instead of coming up with extension elements that replicate existing file formats, one should simply embed links to files in the appropriate formats. This argument has been used in various discussions on syndicating calendar information (i.e. iCalendar payloads) and contact lists (i.e. vCard payloads). See James Snell's post Notes: Atom and the Google Data API for more on this topic.

  3. Embed microformats in [X]HTML content: A microformat is structured data embedded within another markup language (typically HTML/XHTML). This allows one to represent both human-readable data and machine-readable data in a single document. The Structured Blogging initiative is an example of this technique.

All three approaches have their pros and cons. Option #1 is problematic because it encourages a proliferation of duplicative extensions and may lead to fragmenting the embedded data into multiple unrelated elements instead of a single document/format. Option #2 requires RSS/Atom clients to either build parsers for non-syndication formats or rely on external libraries for consuming information in the feed. The problem with Option #3 above is that it introduces a dependency on an HTML/XHTML parser for extracting the embedded data from the content of the feed.

From my experience with RSS Bandit, I have a preference for Option #1 although there is a certain architectural purity with Option #2 which appeals to me. What do the XML syndication geeks in the audience think about this?


 

The Windows Live Mail Desktop team have a blog post entitled Better Together with Active Search where they talk about a new feature currently being called "Active Search". The post is excerpted below

Much of what you need to get done online – from planning your next vacation to remembering to buy flowers for your mom on her birthday – is piling up in your inbox, just waiting for you to take action, usually by looking something up on the web.

With this in mind, we’ve designed Active Search to make it easier for you to act on anything that piques your interest while reading your email.That’s why we show you key search terms we find in a message and provide a search box right underneath, so you can quickly search for terms of your own.

We also show search results and sponsored links right inline, so you can see what’s related to your message on the web, without having to open a new browser window. Of course, if you come across something really interesting, just click More results… and we’ll open a new window with a full set of search results for you to dive into.

Because we only look for relevant keywords in the current email message or RSS article you happen to be viewing in your inbox, there are times when we just can’t find anything relevant enough to show you. So we don’t – we just show a search box ready for you to enter search terms you happen to come up with while reading the message.

I got a demo of this feature from Bubba in the cafeteria a few weeks ago and it seemed pretty interesting. It reminds me of text ads in GMail but for a desktop application and a few other key differences. What I'd love to know is whether there is a plan to make some of this stuff available as APIs for non-Windows Live applications. I wouldn't mind being able to integrate search ads into RSS Bandit to offset some of our hosting costs.


 

Categories: Windows Live

June 2, 2006
@ 03:41 PM

Mark Cuban has blog post entitled Why I think ClickFraud is far greater than imagined where he lays out some anecdotal reasons why he thinks the sky is falling for Pay-Per-Click (PPC) advertising due to click fraud. He writes

Now i have no idea how much money is being lost to click fraud. All i know is that when the black hat hackers see easy money, they take it. I also know that they are greedy and a jealous bunch. The more they see the more they take, so you can pretty well bet that the amount of click fraud is going up by the minute.

And no amount of IP repetition algorithms are going to stop them.

Again, this is all opinion. No investigative reporting going on. Yet.

I have no hard numbers on how much is being "lost" to click fraud but there are a number of reasons why I'm skeptical about how much attention people pay to scare mongering about click fraud and it's effects on companies like Google when 'the market corrects'.

Reason number one is that despite how much people may complain about PPC advertising, it works a lot better than the alternatives. Mark Cuban actually just wrote a blog post a few days ago entitled A quick letter to the Newspaper and Magazine Industries where he complains about how expensive traditional advertising is compared to the returns. If you were trying to drive traffic to your website and had a million dollars to spend would you spend it on newspaper ads, television ads or AdSense/Adwords ads? As long as the return on investment (ROI) for PPC ads is higher than other forms of advertising, I suspect advertisers will consider the money lost to click fraud as acceptable losses. This is no different from retail stores which have to accept a certain amount of loss from shop lifting and customer returns yet still remain profitable.

Another reason I'm skeptical about fear mongering around click fraud is that this isn't the first time technology has made it easy to subvert a market yet those markets still exist. My employer (Microsoft) has built one of the most profitable businesses in the world selling products that can be copied by anyone with a CD burner and a computer. In college, I don't think I knew many people who paid for their copies of Microsoft Office yet that business still manages to bring in billions of dollars in profit a year. Then there are other more recent markets like ring tones. This has emerged as a multi-billion dollar industry in the past few years even though it is quite possible for people to get ringtones for free on their cell phones without much hassle. And then there's the iTunes Music Store

I'd be unsurprised if there is a larger than commonly assumed amount of click fraud going on out there. No more surprised than I'd be to find out that software/music piracy rings exist, insurance fraud is about 15 %- 30% of insurance claims or that shoplifting costs retailers billions of dollars every year. I don't see people ringing the death knell of WalMart or Safeco because of these grim statistics. So the next time you see someone claiming that the sky is falling because of click fraud, remember that fraud exists in every aspect of human commerce and the Web isn't going to change that. 


 

While I was in the cafeteria with Mike Vernal this afternoon I bumped into some members of the Windows Desktop Search team. They mentioned that they'd heard that I'd decided to go with Lucene.NET for the search feature of RSS Bandit instead of utilizing WDS. Much to my surprise they were quite supportive of my decision and agreed that Lucene.NET is a better solution for my particular problem than relying on WDS. In addition, they brought an experienced perspective to a question that Torsten and I had begun to ask ourselves. The question was how to deal with languages other than English.

When building a search index, the indexer has to know what the stop words it shouldn't index are (e.g. a, an, the) as well as have some knowledge about word boundaries. Where things get tricky is that a user can receive content in multiple languages, you may receive email in Japanese from some friends and English from others. Similarly you could subscribe to some feeds in French and others in Chinese. Our original thinking was that we would have to figure out the language of each feed and build a separate search index for each language. This approach seemed error prone for a number of reasons

  1. Many feeds don't provide information about what language they are in
  2. People tend to mix different languages in their speech and writing. Spanglish anyone?

The Windows Desktop Search folks advised that instead of building a complicated solution that wasn't likely to work correctly in the general case, we should consider simply choosing the indexer based on the locale/language of the Operating System. This is already what we do today to determine what language to display in the UI and we have considered allowing users to change the UI language in future which would also affect the search indexer [if we chose this approach]. This assumes that people read feeds primarily in the same language that they chose for their operating system. This seems like a valid assumption but I'd like to hear from RSS Bandit users if this is indeed the case. 

If you use the search features of RSS Bandit, I'd appreciate getting your feedback on this issue.


 

Categories: RSS Bandit | Technology

I was just reading the blog post entitled $40,000 Is a Lot of Dollars on the Windows Live Messenger team's blog and saw that we've announced the Invasion of the Robots contest. From the website

Microsoft is challenging developers worldwide to create conversational robots, or BOTs, for MSN® Messenger and Windows Live™ Messenger. The most original, useful robots collect $40,000 in total prizes.

Too bad, I'm not eligible to enter the contest. $40,000 sounds like a nice chunk of change for a summer's worth of coding.


 

Categories: Windows Live

May 31, 2006
@ 02:30 PM

2006 looks like another good year for superhero movies. I just saw X-Men: The Last Stand and I'm looking forward to Superman Returns and My Super Ex-Girlfriend. I love the annual summer movie season and this year has been pretty decent so far for movies.  Below is a brief list of movies I've seen this year with a mini-review and a rating.

  • Thank You For Smoking (***** out of *****): Excellent satire. Hilarious because you know it is true. The only dark spot was Katie Holmes who just seems unbelievable in any role outside of Dawson's Creek. She ruined Batman Begins for me as well.

  • Mission: Impossible 3 (**** out of *****): An action packed adrenaline rush. Definitely the best movie in the series.

  • Ice Age: The Meltdown (**** out of *****): As funny as the original. I like how the movie was about global warning but they managed to not include any lectures on the topic in the movie.

  • RV (**** out of *****): Funny, if you liked Chevy Chase's vacation movies from years past. I did.

  •  X-Men: The Last Stand (*** out of *****): Not as action packed as the second movie in the series. Having two main plotlines(Jean Grey's Phoenix Saga and a cure for mutants) instead of a single theme didn't help. They definitely tried to go out with a bang but in the final seconds of the movie you could tell that they left the door open for more sequels even if this is the "Last Stand". Lame.
  • The Brothers Grimm (** out of *****): This movie was ridiculously bad, I'm stunned Matt Damon agreed to be in this crap. I didn't bother to finish it.

  • Elizabethtown (** out of *****): Besides the fact that there seemed to be little on-screen chemistry between Kirsten Dunst and Orlando Bloom, this movie was trying too hard to tug at our heart strings. Go watch The Family Stone instead.

  • Date Movie (* out of *****): Rule #1 of parody moies, don't make a parody that is less funny than the original.

What movies have you seen this year that are worth seeing and what would you advise that I avoid given my ratings above?


 

Categories: Movie Review | Personal

Torsten and I have started working on RSS Bandit regularly again. Last weekend I fixed a bunch of bugs including the problem that prevented IE 7 from importing OPML files from RSS Bandit. I've gotten a few emails from folks at work about that particular issue so I thought it would be good to knock that issue out early. This morning, I checked-in support for the Atom Thread Extensions which means I can now see comment counts and view comments inline on Sam Ruby's blog.

One change we're planning to make is to switch to using a full-fledged text search engine to power the search feature of RSS Bandit. Currently, we load all the text in memory and use the .NET Framework's string comparison operators to find the target text. We want to move to a model where files on disk are indexed in the background and we don't have to have stuff in memory to search it. This should significantly improve the memory consumed by RSS Bandit.

We've investigated a couple of options for our search solution. My first thought was integrating with MSN Windows Desktop Search. After exchanging some mail with various members of the team, I decided that this wouldn't meet our needs for a number of reasons

  • Users will need to have Windows Desktop Search installed so we either need to figure out how to bundle it with RSS Bandit or disable the feature if it is not installed.
  • The indexing service is file-centric. However we need to index individual RSS/Atom items within the cached RSS/Atom feeds on disk. This means we'll have to change our model to storing one file per RSS/Atom item which could lead hundreds to thousands of files per feed.
  • The biggest gotcha was that making the indexer understand the structure of RSS/Atom feeds requires writing a custom IFilter which involves gnarly C++ coding then dealing with hairy COM<->.NET interop issues. Not exactly the kind of work one wants to do in their free time.

After further investigation we've settled on Lucene.NET which doesn't have any of the aforementioned problems. However we have been dealing with some issues that could either be bugs or just a misunderstanding of how the APIs should be used. We'll keep you posted. 


 

Categories: RSS Bandit

In his blog post entitled Announcing Windows Live Gadget SDK James Lau writes

I am very excited to bring you the first public release of the Windows Live Gadget SDK today! You can start using this SDK right now to build Gadgets that run on Live.com...But as many features as we are adding to Live.com, the site is still very much a Gadget platform for you developers out there to build on. We rely on you to build rich and interesting Live Gadgets that we haven't thought of, and to build a strong ecosystem around this platform. Live.com is still in Beta today, but it promises to be one of the most popular Internet destinations when we launch later this year. You can leverage on the high traffic site to extend services beyond your web site by building Gadgets that live on Live.com.

Although we are releasing the SDK today, the Gadget platform and APIs are still changing. And we want to listen to your ideas and feedback to help us build a better platform. Some of the things that we know we need to work on are:

  1. Unified Gadget model - we want to enable developers to write a Gadget once and have it run on both Vista Sidebar and on the web, maybe even in other environments.
  2. Allow 3rd party gadgets to change header and footer - today, all 3rd party Gadgets are hosted within and iframe and do not have access to the title, title icon and footer.
  3. Make calling web services easier - this is self-explanatory.
  4. Better Settings model - there is no standard way to do settings today for 3rd party Gadgets. We would like to move to a more declarative model.
  5. Better Localization model - we provide API for you to find out the query the current locale but we don't provide much other support otherwise. This is not a big problem for most Gadgets, but it would be nice for more advanced Gadgets.
I can probably think of 5 or 6 others, but I'd rather have you tell me what you think are the important things you want to see.

I've been waiting for the Gadgets SDK to ship for a while so I could rewrite my MSN Spaces Photo Album gadget and turn it into an article. Expect to see an article about this from me before the end of next month. Kudos to James, Scott Isaacs and the rest of the gadgets gang for getting this out. 


 

Categories: Windows Live

If you are a developer interested in building applications or mashups with Windows Live APIs then you should keep an eye on http://dev.live.com. This will be a complimentary site to the Windows Live developer center on MSDN. The site hasn't launched yet but we already have some content on there such as the Virtual Earth interactive SDK. Expect more details to seep out the closer we get to TechEd 2006. In the meantime, you can keep an eye on Ken Levy's blog to get the skinny.

Having a developer center on MSDN and a separate community-centric site hosted on live.com is something I proposed a couple of months ago. This model has seemed to work for http://msdn.microsoft.com/aspnet and http://www.asp.net where the former site is where official documentation and downloads live while the latter is where you find screencasts, forums and other more interactive content. I expect there will be a bunch of crosslinking between http://dev.live.com and http://msdn.microsoft.com/live once both sites get rolling along.

PS: It's kinda crazy seeing how many recommendations from my thinkweek paper are actually being implemented. I definitely will be writing another one this fall.


 

Categories: Windows Live

From the press release MSN Spaces Now Largest Blogging Service Worldwide we learn

REDMOND, Wash. — May 24, 2006 — MSN® Spaces is the most widely used blogging service worldwide with more than 100 million unique visitors, according to data released today by comScore Networks Inc. of Reston, Va., an independent Internet audience measurement and consulting company.

comScore World Metrix’s proprietary audience report for April 2006 showed the total number of unique visitors to MSN Spaces has more than doubled in the past 12 months, from 41.65 million to 101 million.* Figures compiled by comScore Media Metrix indicate that during April 2006, nearly one in seven Internet users worldwide had visited MSN Spaces.

MSN Spaces allows consumers to create personal Internet sites where they can express themselves in a variety of ways and interact with the important people in their life. The service provides people with a place to create and update a Web log, or blog, as well as share photos, music playlists and more. For example, more than 6 million photos are uploaded to the service each day, with more than 2.5 billion photos uploaded since MSN Spaces launched as a beta service in December 2004.

It's quite cool realize that I've been working on the MSN Windows Live communications services platform team for about a year and a half building the world's most popular blogging service and supporting the worlds most popular instant messaging client to boot. I guess since we haven't rolled out the social networking features of MSN Spaces across the entire site, we can't be called the world's most popular social networking service. Yet.

Thanks to all our users who keep using our services and giving us great feedback. You rock. We have lots of good stuff planned for y'all in the coming months.


 

Categories: Windows Live