September 30, 2005
@ 08:14 PM

There have been a number of amusing discussions in the recent back and forth between Robert Scoble and several others on whether OPML is a crappy XML format. In posts such as OPML "crappy" Robertson says and More on crappy formats Robert defends OPML. I've seen some really poor arguments made as people rushed to bash Dave Winer and OPML but  none made me want to join the discussion until this morning.

In the post Some one has to say it again… brainwagon writes

Take for example Mark Pilgrim's comments:

I just tested the 59 RSS feeds I subscribe to in my news aggregator; 5 were not well-formed XML. 2 of these were due to unescaped ampersands; 2 were illegal high-bit characters; and then there's The Register (RSS), which publishes a feed with such a wide variety of problems that it's typically well-formed only two days each month. (I actually tracked it for a month once to test this. 28 days off; 2 days on.) I also just tested the 100 most recently updated RSS feeds listed on blo.gs (a weblog tracking site); 14 were not well-formed XML.

The reason just isn't that programmers are lazy (we are, but we also like stuff to work). The fact is that the specification itself is ambiguous and weak enough that nobody really knows what it means. As a result, there are all sorts of flavors of RSS out there, and parsing them is a big hassle.

The promise of XML was that you could ignore the format and manipulate data using standard off-the-shelf-tools. But that promise is largely negated by the ambiguity in the specification, which results in ill-formed RSS feeds, which cannot be parsed by standard XML feeds. Since Dave Winer himself managed to get it wrong as late as the date of the above article (probably due to an error that I myself have done, cutting and pasting unsafe text into Wordpress) we really can't say that it's because people don't understand the specification unless we are willing to state that Dave himself doesn't understand the specification.

As someone who has (i) written a moderately popular RSS reader and (ii) worked on the XML team at Microsoft for three years, I know a thing or two about XML-related specifications. Blaming malformed XML in RSS feeds on the RSS specification is silly. That's like blaming the large number of HTML pages that don't validate on the W3C's HTML specification instead of on the fact that instead of erroring on invalid web pages web browsers actually try to render them. If web browsers didn't render invalid web pages then they wouldn't exist on the Web.

Similarly, if every aggregator rejected invalid feeds then they wouldn't exist. However, just like in the browser wars, aggregator authors consider it a competitive advantage to be able to handle malformed feeds. This has nothing to do with the quality of the RSS specification [or the HTML specification], this is all about applications trying to get marketshare.  

As for whether OPML is a crappy spec? I've had to read a lot of technology specifications in my day from W3C recommendations and IETF RFCs to API documentation and informal specs. They all suck in their own ways. However experience has thought me that the bigger the spec, the more it sucks. Given that, I'd rather have a short, human readable spec that sucks a little (e.g. RSS, XML-RPC, OPML etc.) than a large, jargon filled specificaton which sucks a whole lot more (e.g. WSDL, XML Schema, C++, etc). Then there's the issue of using the right tool for the job but I'll leave that rant for another day.


 

Categories: XML

While using Firefox this morning, I just realized something was missing. There is a Google Toolbar for Firefox, there is a Yahoo! Toolbar for Firefox, so how come there isn't an MSN Toolbar for Firefox? Just yesterday, Ken Moss who runs the MSN Search team posted on their blog about MSN Search Plugins for Firefox where he wrote

However, some of our customers prefer using Firefox and we respect that choice.  Some developers in our user community have created Firefox plug-ins to make it easy to do searches on MSN from the Firefox search box.  Even though it’s currently buried in Firefox under “Add Engines… Find lots of other search engines…”, it seems that our customers have been finding it since we’re listed as one of the most popular search engine plugins.

I use Firefox sometimes in the course of my job – and when I do, I love having the MSN Search engine plugged-in up in the chrome.  If you’re currently a Firefox user – I hope you’ll enjoy this little nugget. For more MSN Search fun with Firefox (or IE!), try out the PDC version of MSN Search enabled by a Trixie / Greasemonkey script.

It's cool to see the MSN Search team giving a shout out to plugins built by the developer community but I think it would be even cooler if we step up to the plate like Yahoo! and Google have done by providing an official,  full fledged toolbar for Firefox.


 

Categories: MSN

September 29, 2005
@ 07:30 PM

Kitty came by my office to remind me that the Web 2.0 conference is next week. As part of the lead up to the conference I can see the technology geek blogosphere is buzzing with the What is Web 2.0? discussion which was sparked off by Tim O'Reilly's posting of the Web 2.0 meme map created during FooCamp. The meme map is below for the few folks who haven't seen it 

The meme map is visual indication that "Web 2.0" has joined "SOA" as a buzzword that is too ill-defined to have a serious technical discussion about. It is now associated with every hip trend on the Web. Social Networking? That's Web 2.0. Websites with APIs? That's Web 2.0. The Long Tail? That's Web 2.0. AJAX? That's Web 2.0. Tagging and Folksonomies? That's Web 2.0 too. Even blogging? Yep, Web 2.0.

I think the idea and trend towards the 'Web as a platform' is an important one and I find it unfortunate that the discussion is being muddied by hypesters who are trying to fill seats in conference rooms and sell books.

I'm in the process of updating my Billl Gates Thinkweek paper on MSN and Web platforms to account for the fact that some of my recommendations are now a reality (I helped launch http://msdn.microsoft.com/msn) and more importantly given recent developments it needs to change tone from a call to action to being more prescriptive. One of the things I'm considering is removing references to "Web 2.0" in the paper given that it may cause a bozo bit to be flipped. What do you think?


 

Categories: Web Development

We're almost ready to begin public beta testing of our implementation of the MetaWeblog API for MSN Spaces. As with most technology specifications, the devil has been in the details of figuring out how common practice differs from what is in the spec.

One place where we hit on some gotchas is how dates and times are defined in the XML-RPC specification which the MetaWeblog API uses. From the spec

Scalar <value>s

<value>s can be scalars, type is indicated by nesting the value inside one of the tags listed in this table:

Tag Type Example
<dateTime.iso8601> date/time 19980717T14:08:55

The reason the above definition of a date/time type is a gotcha is that the date in the example is in the format YYYYMMDDTHH:MM:SS. Although this is a valid ISO 8601 date, most Web applications that support ISO 8601 dates usually support the subset defined in the W3C Note on Dates and Time Formats which is of the form YYYY-MM-DDTHH:MM:SS. Subtle but important difference.

Another thing that had me scratching my head was related to timezones in XML-RPC. The spec states

  • What timezone should be assumed for the dateTime.iso8601 type? UTC? localtime?

    Don't assume a timezone. It should be specified by the server in its documentation what assumptions it makes about timezones.

This just seems broken to me. What if you are a generic blog posting client like Blogjet or W.Bloggar which isn't tied to one particular server? It would seem that the only sane thing that can happen here is for the dates & times from the server to always be used and dates & times from clients be ignored since they are useless without timezones. If I get a blog post creation date of September 29th at 4:30 PM from a client, I can't use it since without a timezone I'll likely date the entry incorrectly by anything from a few hours to an entire day.

It probably would have been better to retrofit timezones into the spec than just punting on the problem as is the case now. I wonder what other interesting gotchas are lurking out there for our beta testers to find. :)


 

I've been in the process of moving apartments so I haven't had free time to work on RSS Bandit. In the meantime, we've been getting lots of excellent bug reports from people using the alpha version of the Nightcrawler release. One of the bug reports we've gotten was that somewhere along the line we introduced a bug that caused significant memory consumption (hundreds of megabytes). Since I've been busy, Torsten tracked it down and wrote about it in his post More to Know About .NET Timers. He wrote

As you may know, .NET 1.1 supports three different Timer classes:

  1. Windows timers with the System.Windows.Forms.Timer class
  2. Periodical delegate calling with System.Threading.Timer class
  3. Exact timing with the System.Timers.Timer class

The timings are more or less accurate (see CodeProject: Multithreading in .NET), but that is not the point I want to highlight today. Two sentences from the mentioned codeproject article are important for this post:

"... Events raised from the windows forms timer go through the message pump (together with all mouse events and UI update messages)..."

and

"...the System.Timers.Timer class. It represents server-based timer ticks for maximum accuracy. Ticks are generated outside of our process..."

To report state and newly retrieved items from requested feeds we used a concept to serialize the asynchronous received results from background threads with the help of a timer. This was introduced in the NightCrawler Alpha Dare Obasanjo posted last week for external tests. Some users reported strange failures, memory hog up and bad UI behavior with this Alpha so I would suggest here to not use it anymore for testing if your subscribed feeds count is higher than 20 or 30 feeds.

The idea was not as bad as it seems to be (if you only look at the issues above). The real issue in our case was to use simply the wrong timer class! The UI state refresh includes an update of the unread counters that is reported to the user within the treeview as number postfixes and (more important here) a font refresh (as user decides, default is to mark the feed caption text bold).
...

So what happens exactly now if the timer fires? I used the CLR Profiler to get the following exiting results. The event is called in sync. with the SynchronizingObject, means Control::WndProc(m) calls calls into System.Windows.Forms.Control::InvokeMarshaledCallbacks void(), MulticastDelegate::DynamicInvokeImpl()... and then our event method OnProcessResultElapsed(). The allocation graph mentions 101 MB (44.78%) used here!
...

So what to do to fix the problem(s)? Simply use the Windows.Forms.Timer! Think about it: it is driven by the main window message pump and runs always in the right context of the main UI thread (no .InvokeRequired calls). Timing isn't an important point here, we just want to process one result each time we are called. Further: no cross-AppDomain security check should happen anymore! With that timer it is just a simple update control(s) with some fresh data!

So take care of the timer class(es) you may use in your projects! Check their implications!

Tracking down bugs is probably the most satisfying and yet frustrating things about programming. I'm glad we got to the root of this problem.

By the way, don't forget to cast your vote in the RSS Bandit Logo Design contest. The time has come for us to update the imagery related the application and we thought it'd be great t have both the new logo and the decision on what it should be in the hands of our users.

 


 

Categories: RSS Bandit

Robert Scoble has a post entitled Search for Toshiba music player demonstrates search engine weakness where he complains about relevance of search results returned by popular web search engines. He writes

Think search is done? OK, try this one. Search for:

Toshiba Gigabeat

Did you find a Toshiba site? All I see is a lot of intermediaries.

I interviewed a few members of the MSN Search team last week and I gave them hell about this. When I'm writing I want to link directly to the manufacturer's main Web site about their product. Why? Because that's the most authoritative place to go.

But I keep having trouble finding manufacturer's sites on Google, MSN, and Yahoo.

Relevancy ratings on search engines still suck. Let's just be honest about it as an industry.

Can the search researchers find a better algorithm? I sure hope so.

Here, compare for yourself. If you're looking for the Toshiba site, did you find what you're looking for when you do searches Google ? On Yahoo ? On MSN ?

Here's the right answer: http://www.gigabeat.toshiba.com . Did you find it with any of the above searches? I didn't.

The [incorrect] assumption in Robert Scoble's post is that the most relevant website for a person searching for information on a piece of electronic equipment is the manufacturer's website. Personally, if I'm considering buying an MP3 player or other electronic equipment I'm interested in (i) reviews of the product and (ii) places where I can buy it. In both cases, I'd be surpised if the manufacturer's website would be the best place to get either.

Relevancy of search results often depends on context. This is one of the reasons why the talk on Vertical Search and A9.com at ETech 2005 resonated so strongly with me. The relevancy of search results sometimes depends on what I want to do with the results. A9.com tries to solve this by allowing users to customize the search engines they use when they come to the site. Google has attempted to solve this by mixing in both traditional web search results with vertical results inline. For example, searching for MSFT on Google returns traditional search results and a stock quote. Also searching for "Seattle, WA" on Google returns traditional web search results and a map. And finally, searching for "Toshiba Gigabeat" on Google returns traditional web search reults and a list of places where you can buy one. 

Even with these efforts, it is unlikely any of them would solve the problem Scoble had as well as if he just used less ambiguous searches. For example, a better test of relevance is which search engine gives the manufacturer's website for the search for "toshiba gigabeat website".

I found the results interesting and somewhat surprising. There definitely is a ways to go in web search.


 

Categories: Technology

From the press release MSN Launches Paid-Search Service in France and Singapore we learn

NEW YORK — Sept. 26, 2005 — Today, Yusuf Mehdi, senior vice president of the MSN® Information Services & Merchant Platform division, opened the second annual Advertising Week 2005 in New York City by announcing the official launch of adCenter in France and Singapore. adCenter powers a paid-search service from MSN that provides advanced audience intelligence and targeting capabilities to help advertisers improve their return on investment when it comes to paid-search advertising.The official launch of adCenter in France today and Singapore on Aug. 31 follows successful pilot programs in both countries. U.S. testing of adCenter is set to begin in October.
...
Powerful campaign management tools and deep audience intelligence unique to MSN make it easy for advertisers to optimize and refine their campaigns to reach a specific audience. Some of those tools include the following:

  •  Keyword Selection allows advertisers to indicate whom they want to reach based on geographic location, gender, age range, time of day and day of week, and suggests keywords based on the desired audience.

  • Site Analyzer assists advertisers by suggesting keywords based on the content of their Web site, rather than on another keyword.

  • Audience Profiler provides advertisers with an expected profile of those customers who are most likely to search for specific keywords.

  • Cost Estimator helps advertisers remain within their budget by estimating rank, traffic and cost per month per keyword.

  • Campaign Optimization allows advertisers to respond quickly and decisively throughout the campaign to easily refine budget allocations and keywords, as well as apply targeting filters such as geographic, demographic and dayparting.

  • Post Sales Audience Intelligence & Reporting provides advertisers with detailed reports on campaign performance and audiences reached including click-through rate, estimated position and spending levels.

"We’re excited by the positive feedback we have received from advertisers thus far," Mehdi said. "The launch of adCenter in France and Singapore is a great first step to delivering on our global vision to connect advertisers to consumers in a much more meaningful way."

In the near future, adCenter will become a one-stop shop from which advertisers can manage all their MSN advertising campaigns, end to end, including display and direct ads. In addition, advertisers will be able to use advanced targeting tools and audience intelligence data to reach their desired audiences across the MSN network. Advertisers interested in learning more or signing up for adCenter can go to http://advertising.msn.com.

Our team got a demo of adCenter a few months ago and it definitely looks like it hits all the right points which is impressive for a version 1.0 offering. Given how much revenue MSN gets from advertising it's good to see us giving advertisers more tools to improve the value to get from advertising on MSN. Based on the fact that competitors like Yahoo! and Google already have offerings in this space, adCenter is an overdue addition to our stable of services.


 

Categories: MSN

September 23, 2005
@ 02:06 PM

From Omar Shahine's post mail.start.com we learn

Well, we launched Kahuna Milestone 3 (M3) yesterday with a new URL (http://mail.start.com). We are building Kahuna iteratively, and plan on releasing much goodness on a frequent basis. This is very different from the way that Hotmail and MSN has typically released software, but we feel it’s the best way to achieve success.

As I mentioned recently I've been using the Hotmail beta for a while now and it's a phenomenal improvement over the current version. Below is a screenshot of the beta in action.

Hotmail Beta screenshot

If you'd like invites to the beta, you should keep an eye on the Hotmail team's space. You can also find more screenshots of the Hotmail beta on their space as well.


 

Categories: MSN

September 23, 2005
@ 01:40 PM

I got an email about Six Apart's Project Comet yesterday and it definitely made me smile. It's good to see more validation of the idea that personal publishing (weblogging) should evolve into personal self expression (my blog, my photos, my relationships, my media, etc). This is the same direction taken by services such as MySpace, MSN Spaces and Yahoo! 360°.

Weblogs are replacing the personal homepage and thus to reflect all the facets of one's personality, they need to be more than 'my online journal'. Most of the big vendors in this space have cottoned onto this idea slowly but surely. I wonder when the coin will drop over in the Blogger offices at Google.


 

Categories: Social Software

I've been participating in the inaugural podcast for the Microsoft Architecture series. From the website

ARCast is an ongoing podcast series created by the Architect Strategy Team with the goal of spawning insightful, enlightening and sometimes contentious conversations about the hottest topics in Architecture today

The topic for the first series of podcasts are the problems facing interoperability in web services today. The participants are myself, Jeffrey Schlimmer, Michele Leroux Bustamante, Roger Sessions, and Chris Haddad. Most of the discussion has been about adoption of WS-* which even though I work on web services that are utilized by millions of people every day, I've never really found the technologies outside the core XML web services standards (SOAP/WSDL/XSD) to be of much importance to our main scenarios.


 

Categories: XML Web Services