I'm using the Windows Background Intelligent Transfer Service (BITS) as the technology for downloading podcasts in the background so that RSS Bandit doesn't hog too much bandwidth while downloading the latest Ze Frank video. However it came to my attention that there certain conditions that had to apply before BITS was able to be clever about downloading a file from a website in the background. The conditions are spelled out in the HTTP Requirements for BITS Downloads which states

BITS supports HTTP and HTTPS downloads and uploads and requires that the server supports the HTTP/1.1 protocol. For downloads, the HTTP server's Head method must return the file size and its Get method must support the Content-Range and Content-Length headers. As a result, BITS only transfers static file content and generates an error if you try to transfer dynamic content, unless the ASP, ISAPI, or CGI script supports the Content-Range and Content-Length headers.

This means you can't use BITS to download podcasts from the feeds of sites such as C|Net MP3 Insider because it doesn't provide a Content-Length header when retrieving podcasts. Due to this limitation I've had to implement a fallback mode where we use a direct HTTP download request to retrieve the podcast. This solution is problematic if large video files are being downloaded in this manner because all the PCs bandwidth may end up being consumed by this task. For this reason, I've borrowed a leaf from the RSS platform in IE 7 and will also only support this for podcasts that are 15MB or less.

I sampled a number of files over 15MB at http://audio.weblogs.com and didn't see many which were provided by a Web server that didn't meet the BITS requirements. Of course, I might be mistaken and there is some popular podcast which regularly provides files over 15MB and doesn't meet the conditions set forth by BITS. In that case, I'd consider upping the limit to something higher or providing some config file option to increase the limit. 


 

Categories: RSS Bandit

December 19, 2006
@ 02:45 PM

My girlfriend recently purchased an iDog for one of her kids and I thought that was the silliest iPod accessory imaginable. It seems I was wrong. Podcasting News has an article entitled The Ten Worst iPod-Related Christmas Presents Ever which has gems such as

iPod Toilet Paper Dispenser

Here’s something that we thought we should flush out of our system right away - the iCarta toilet paper dispenser/iPod player. The last thing we want anyone doing in the Podcasting News bathroom is making a #$#@ playlist for using the toilet.
icarta pod toilet potty
The one that really takes the cake is the iBuzz. You'll have to read the article to see what that accessory does.
 

Earlier today I noticed a link from Mike Torres to a press release from ComScore Media Metrix entitled The Score: Blogs Gain Favor Worldwide which states

In recent years blogs have garnered significant media coverage in the United States for their ability to reach a wide audience. With more than one-third of the online population in the United States visiting blogs within a given month, it is clear that the category has become mainstream. An analysis of blog penetration by country in North America and Western Europe shows that the popularity of blogs is a worldwide phenomenon.
...
  • Windows Live Spaces is the favorite blog site among the majority of countries studied, with 37 percent of all Canadians visiting the site in October 2006. Blogger.com had the highest penetration in the United States (12.4 percent) and Germany (9.7 percent), while the same was true for Skyblog in France (27.4 percent).

Interesting statistics although I wonder whether ComScore is including social networking sites like Bebo and MySpace in its reckoning. Based on how ComScore usually scores things my assumption is that they are going by number of unique users instead of page views which is where heavily trafficked social networking sites like Bebo and MySpace reign supreme.


 

Categories: Social Software | Windows Live

December 19, 2006
@ 02:02 PM

Brady Forrest over on the O'Reilly Radar blog just announced that Google Deprecates Their SOAP Search API where he states

In an odd move Google has quietly deprecated their Search SOAP API, will no longer be issuing keys, and have removed the SDK from their site. They did not even issue a blog post about it. They will continue (for how long?) to support existing users, but will not do any bug fixes. They are urging developers to use their AJAX Search API ((Radar post) instead.

The AJAX Search API is great for web applications and users that want to bling their blog, but does not provide the flexibility of the SOAP API. I am surprised that it has not been replaced with a GData API instead. The developer community has been discussing this and do not seem happy with the change. Discussion on the forums have pointed out that Yahoo! has a REST Search API. Live Search also has a SOAP API available.

I find it odd that Brady is surprised by this move. Brady used to work on the MSN Windows Live Search team working on APIs and he should know first hand that the value of Search APIs was always questioned. Unlike data APIs which extend the reach of a service and add value via network effects such as the MetaWeblog API, Flickr API or the del.icio.us API, the search APIs provided by the major search engines do no such thing. With the data APIs one can argue that making it easier for people to add content to sites increases their value, on the other hand making it easier for people to run search queries without seeing highly lucrative search ads doesn't make much business sense.

This reminds me of a quote from Bill Gates taken by Liz Gannes in her report Bill Gates on the Future of Web Apps which is excerpted below

We each got to ask Gates one question. I asked which applications he forecast to live within the browser and which outside of it.

He replied that the distinction would come to be silly from a technical standpoint, but that the necessary movement toward web APIs does present challenges on the business side. “One of the things that’s actually held the industry back on this is, if you have an advertising business model, then you don’t want to expose your capabilities as a web service, because somebody would use that web service without plastering your ad up next to the thing.”

His solution wasn’t very specific: “It’s ideal if you get business models that don’t force someone to say ‘no, we won’t give you that service unless you display something right there on that home page.”

The quote seems particularly relevant now when you consider that Google has replaced a web service with their AJAX Search API which is a widget that is easier to monetize. I'd also note that Scoble telegraphed that this move was coming in his post Google changes its monetization strategy toward a Microsoft one? which implies that Google AdSense will be bundled with usage of Google's search widgets.


 

December 15, 2006
@ 03:09 AM

Moishe Lettvin: Large companies and 'A' talent

But then I got an offer from Google and after a little bit of waffling (I was having much fun with the hackers) I started there back in January. And holy shit I hope I can convey to you what sort of geek heaven I'm in now.

Above I talked about NT4 being the "new hotness" back in '94 -- the guys who made it that way sit right next to me. In the same office. And that sort of expertise is everywhere here... it seems like every office is occupied by at least a couple of industry leaders, guys whose names you'd recognize if you're even a casual observer of geek culture.

Google's culture values independence and transparency of communication in ways I didn't think were possible at a large company. We've of course got our 20% time, but beyond that there's a sense that everyone here is competent enough and trustworthy enough to be clued in to many parts of the business -- not just engineering -- which would typically be hidden. That trust nets huge gains in loyalty and excitement.

There aren't many places in the world where you could can come up with the idea for a feature or product, implement it, and launch it to an audience of millions, with the infrastructure to support it. Yes, you can do it at a startup or on your own, but getting eyeballs and servers is non-trivial. For every YouTube there are hundreds of sites nobody's heard of.

Aaron Swartz: The Goog Life: how Google keeps employees by treating them like kids

The dinosaurs and spaceships certainly fit in with the infantilizing theme, as does the hot tub-sized ball pit that Googlers can jump into and throw ball fights. Everyone I know who works there either acts childish (the army of programmers), enthusiastically adolescent (their managers and overseers), or else is deeply cynical (the hot-shot programmers). But as much as they may want to leave Google, the infantilizing tactics have worked: they're afraid they wouldn't be able to survive anywhere else.

Google hires programmers straight out of college and tempts them with all the benefits of college life. Indeed, as the hiring brochures stress, the place was explicitly modeled upon college. At one point, I wondered why Google didn't just go all the way and build their own dormitories. After all, weren't the late-night dorm-room conversations with others who were smart like you one of the best parts of college life? But as the gleam wears off the Google, I can see why it's no place anyone would want to hang around for that long. Even the suburban desert of Mountain View is better.

Google's famed secrecy doesn't really do a very good job of keeping information from competitors. Those who are truly curious can pick up enough leaks and read enough articles to figure out how mostly everything works. But what it does do is create an aura of impossibility around the place. People read the airbrushed versions of Google technologies in talks and academic papers and think that Google has some amazingly large computer lab with amazingly powerful technology. But hang around a Googler long enough and you'll hear them complain about the unreliability of GFS and how they don't really have enough computers to keep up with the load.

"It's always frightening when you see how the sausage actually gets made," explains a product manager. And that's exactly what the secrecy is supposed to prevent. The rest of the world sees Google as this impenetrable edifice with all the mysteries of the world inside ("I hear once you've worked there for 256 days they teach you the secret levitation," explains xkcd) while the select few inside the walls know the truth -- there is no there there -- and are bound together by this burden.

The truth is always somewhere in between.


 

Mark Baker has a blog post entitled Validation considered harmful where he writes

We believe that virtually all forms of validation, as commonly practiced, are harmful; an anathema to use at Web scale. Specifically, our argument is this;
Tests of validity which are a function of time make the independent evolution of software problematic.

Why? Consider the scenario of two parties on the Web which want to exchange a certain kind of document. Party A has an expensive support contract with BigDocCo that ensures that they’re always running the latest-and-greatest document processing software. But party B doesn’t, and so typically lags a few months behind. During one of those lags, a new version of the schema is released which relaxes an earlier stanza in the schema which constrained a certain field to the values “1″, “2″, or “3″; “4″ is now a valid value. So, party B, with its new software, happily fires off a document to A as it often does, but this document includes the value “4″ in that field. What happens? Of course A rejects it; it’s an invalid document, and an alert is raised with the human adminstrator, dramatically increasing the cost of document exchange. All because evolvability wasn’t baked in, because a schema was used in its default mode of operation; to restrict rather than permit.

This doesn't seem like a very good argument to me. The fact that you enforce that the XML documents you receive must follow a certain structure or must conform to certain constraints does not mean that your system cannot be flexible in the face of new versions. First of all, every system does some form of validation because it cannot process arbitrary documents. For example an RSS reader cannot do anything reasonable with an XBRL or ODF document, no matter how liberal it is in what it accepts. Now that we have accepted that there are certain levels validation that are no-brainers the next question is to ask what happens if there are no constraints on the values of elements and attributes in an input document. Let's say we have a purchase order format which in v1 has a <currency> element which can have a value of "U.S. dollars" or "Canadian dollars" then in v2 we now support any valid currency. What happens if a v2 document is sent to a v1 client? Is it a good idea for such a client to muddle along even though it can't handle the specified currency format?

As in all things in software, there are no hard and fast rules as to what is right and what is wrong. In general, it is better to be flexible rather than not as the success of HTML and RSS have shown us but this does not mean that it is acceptable in every situation. And it comes with its own set of costs as the success of HTML and RSS have shown us. :)

Sam Ruby puts it more eloquently than I can in his blog post entitled Tolerance.


 

Categories: XML | XML Web Services

December 14, 2006
@ 05:09 PM

I've noticed that some problems with viewing feeds of sites hosted on TypePad for the past few months in RSS Bandit. The problem was that every other post in a feed would display raw markup instead of correctly rendered HTML. I decided to look into the problem this morning and tracked down the problem. Take a look at http://blog.flickr.com/flickrblog/atom.xml. Here are relevant excerpts from the feed


<content type="html" xml:lang="en-ca" xml:base="http://blog.flickr.com/flickrblog/">
&lt;div xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;&lt;p&gt;&&nbsp;&lt;/p&gt;
&lt;div style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;http://www.flickr.com/gift/&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://us.i1.yimg.com/us.yimg.com/i/ww/news/2006/12/12/gtfof.gif&quot; style=&quot;padding-bottom: 6px;&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;It&#39;s now easier than ever to spread joy this holiday season by giving the &lt;a href=&quot;http://www.flickr.com/gift/&quot;&gt;&lt;strong&gt;Gift of Flickr&lt;/strong&gt;&lt;/a&gt;. You can purchase a special activation code that you can give to anyone, whether or not they have an existing Flickr account. We&#39;ve even created a special Gift Certificate card that you can print out yourself, fold up and stuff in a stocking, under a tree or hidden away for after the candles are lit (of course, you can also send the gift code in an email).&lt;/p&gt;

&lt;p&gt;And it&#39;s even better to give the gift of Flickr since now your recipients will get &lt;a href=&quot;http://www.flickr.com/help/limits/#28&quot;&gt;&lt;strong&gt;unlimited uploads&lt;/strong&gt;&lt;/a&gt; — the two gigabyte monthly limit is no more (&lt;em&gt;yep, pro users have no limits on how many photos they can upload&lt;/em&gt;)! At the same time, we&#39;ve upped the limit for free account members as well, from &lt;a href=&quot;http://www.flickr.com/help/limits/#28&quot;&gt;&lt;strong&gt;20MB per month up to 100MB&lt;/strong&gt;&lt;/a&gt; (yep, five times more)!&lt;/p&gt;

&lt;p&gt;The Flickr team also wants to take this opportunity to thank you for a wonderful year and wish you and yours all the best of the season. Yay!&lt;/p&gt;&lt;/div&gt;
</content>
...
<content type="xhtml" xml:lang="en-ca" xml:base="http://blog.flickr.com/flickrblog/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.flickr.com/photos/eye_spied/313572883/" title="Photo Sharing"><img width="500" height="357" border="0" src="http://static.flickr.com/117/313572883_8af0cddbc7.jpg" alt="Dec 2 2006 208 copy" /></a></p>

<p><a title="Photo Sharing" href="http://www.flickr.com/photos/mrtwism/71294604/"><img width="500" height="375" border="0" alt="riding" src="http://static.flickr.com/34/71294604_b887c01815.jpg" /></a></p>

<p>See more photos in the <a href="http://www.flickr.com/photos/tags/biggame/clusters/cal-berkeley-stanford/">"Berkeley," "Stanford," "big game" cluster</a>.</p>

<p>Photos from <a href="http://www.flickr.com/photos/eye_spied/" title="Link to caryniam's photos">caryniam</a> and <a title="Link to mrtwism's photos" href="http://www.flickr.com/photos/mrtwism/">mrtwism</a>.</p></div>
</content>

So the first mystery is solved. The reason some posts look OK and some don't is that for some reason TypePad seems to alternate between escaped HTML and well-formed XHTML as the content of an entry in the feed. When the feed uses well-formed XHTML the item looks fine but when it uses escaped HTML it looks like crap. The next question is why the items aren't rendered correctly when escaped HTML is used.

So I referred to section 3.1 of the Atom 0.3 specification and saw the following

3.1.2  "mode" Attribute

Content constructs MAY have a "mode" attribute, whose value indicates the method used to encode the content. When present, this attribute's value MUST be listed below. If not present, its value MUST be considered to be "xml".

"xml":
A mode attribute with the value "xml" indicates that the element's content is inline xml (for example, namespace-qualified XHTML).
"escaped":
A mode attribute with the value "escaped" indicates that the element's content is an escaped string. Processors MUST unescape the element's content before considering it as content of the indicated media type.
"base64":
A mode attribute with the value "base64" indicates that the element's content is base64-encoded [RFC2045]. Processors MUST decode the element's content before considering it as content of the the indicated media type.

To prevent aggregators from having to use their psychic powers to determine when an item contains plain text or escaped HTML, the Atom folks introduced a mode attribute that indicated whether the content should be treated as is or should be unescaped. As you can see the default value for this is not "escaped". Since the TypePad Atom feeds do not state that the HTML content is escaped then the aggregator is not expected to unescape the content before rendering it. Second mystery solved. Buggy feeds are the culprit. 

Even though these feeds are broken it is probably faster for me to special case feeds fromTypePad than trying to track down and convince the folks at SixApart that this is a bug worth fixing. This issue will be fixed in the next beta of the Jubilee release of RSS Bandit.


 

December 13, 2006
@ 03:05 AM

Six Months Ago: 10 people who don't matter

Mark Zuckerberg
Founder, Facebook
In entrepreneurship, timing is everything. So we'll give Zuckerberg credit for launching his online social directory for college students just as the social-networking craze was getting underway. He also built it right, quickly making Facebook one of the most popular social-networking sites on the Net. But there's also something to be said for knowing when to take the money and run. Last spring, Facebook reportedly turned down a $750 million buyout offer, holding out instead for as much as $2 billion. Bad move. After selling itself to Rupert Murdoch's Fox for $580 million last year, MySpace is now the Web's second most popular website. Facebook is growing too - but given that MySpace has quickly grown into the industry's 80-million-user gorilla, it's hard to imagine who would pay billions for an also-ran.

Today: Yahoo’s “Project Fraternity” Docs Leaked

At Yahoo, the long running courtship has lasted at least as long as this year, and is internally referred to as “Project Fraternity.” Leaked documents in our possession state that an early offer was $37.5 million for 5% of the company (a $750 million valuation) back in Q1 2006. This was rejected by Facebook.

Things really heated up mid year. Yahoo proposed a $1 billion flat out acquisition price based on a model they created where they projected $608 million in Facebook revenue by 2009, growing to $969 million in 2010. By 2015 Yahoo projects that Facebook would generate nearly $1 billion in annual profit. The actual 2006 number appears to be around $50 million in revenue, or nearly $1 million per week.

These revenue projections are based on robust user growth. By 2010, Yahoo assumes Facebook would hit 48 million users, out of a total combined highschool and young adult population of 83 million.

Our sources say that Facebook flatly rejected the $1 billion offer, looking for far more. Yahoo was prepared to pay up to $1.62 billion, but negotiations broke off before the offer could be made.


 

Nick Bradbury, the author of the excellent FeedDemon RSS reader, has a blog post entitled Simplicity Ain't So Simple, Part II: Stop Showing Off where he writes

One mistake I see developers make over and over again is that we make a feature look complicated just because it was hard to create.
...
For example, the prefetching feature I blogged about last week hasn't been easy to create.  This feature prefetches (downloads) links and images in your feeds so that they're browse-able inside FeedDemon when you're working offline.  It works in the background so you can keep using FeedDemon while it does its business, and it's smart enough to skip web bugs, links to large downloads, and other items that shouldn't be cached (including items that have already been cached in a previous session).

It didn't seem like a complex feature when I started on it, but it ended up being a lot more work than I anticipated.  It could easily be an application all by itself, complete with all sorts of configurable options.

But instead of turning this feature into a mini-application, I demoted it to a lowly menu item

I've had that feeling recently when thinking about a feature I'm currently working on as part of podcasting support in RSS Bandit. The feature is quite straightforward. It is the ability for users to specify a maximum amount of space dedicated to podcasts on computer to prevent their hard drive from filling up with dozens of gigabytes of ScobleShow and Channel 9 videos. Below is a screenshot of what the option looks like.

As I started to implement this feature every question I asked myself led to two or three more questions and the complexity just spiralled. I started with the assumption that we'd enforce the download limit before files were downloaded. So if you have allocated 500MB as the maximum amount of space dedicated to podcasts and you attempt to download funny_video.mov (200MB), funny_song.mp3 (5MB) and scary_short_movie.mpg (300MB) in order then we will issue a warning or an error indicating that there won't be enough room to download the last file before attempting to download it. Here's where I got my first rude awakening; there's no guaranteed way to determine the size of the file before downloading. There is a length attribute of the <enclosure> element but it sometimes doesn't have a valid value in certain podcast feeds. Being a Web geek, I thought to myself "Ha, I can always fall back on making an HTTP HEAD request and then reading the Content-Length header". It turns out this isn't always guaranteed to be set either.

So now we have the possibility that the user could initiate three downloads which would exceed the 500MB she has allocated to enclosures. The next question was when to enforce the limit on the files being downloaded. Should we wait until the files have finished downloading and then fail when we attempt to move the downloaded file from the temporary folder to the user specified podcast folder? Or should we stop downloads as soon as we hit 500MB regardless of the state of the downloaded files which means we'll have to regularly collate the size of all pending downloads and add that to the size of all downloads in the podcast folder to ensure that we aren't over the limit? I was leaning towards the former but when I talked to Torsten he pointed out that it seems like cheating if I limit the amount of space allocated to podcasts to 500MB but they could actually be taking over 1GB on disk because I have four 300MB files being downloaded simultaneously. Unfortunately for me, I agreed. :)

Then there's the question of what to actually do when the limit is hit. Do we prompt the user to delete old files, if so what interface do we provide the user to make the user flow sensible and not irritating? Especially since some of the files will be podcasts in the podcast folder and others will be incomplete files that are pending downloads in a temp folder. Yeah, and it goes on and on.

However all our users will see is that one checkbox and field to enter the numeric value.


 

Categories: RSS Bandit

December 12, 2006
@ 02:29 AM

I've had a number of people mention the article about Steve Berkowitz and MSN/Windows Live in the New York Times entitled Looking for a Gambit to Win at Google's Game which contains a bunch of choice negative quotes about our products supposedly from Steve Berkowitz. The article starts of without pulling punches as you can see from the following excerpt

The pressure is on for Mr. Berkowitz to gain control of Microsoft’s online unit, which by most measures has drifted dangerously off course. Over the last year, its online properties have lost users in the United States. The billions of dollars the company has spent building its own search engine have yet to pay off. And amid a booming Internet market, Microsoft’s online unit is losing money.

Google, meanwhile, is growing, prospering, and moving increasingly onto Microsoft’s turf.

Microsoft lost its way, Mr. Berkowitz says, because it became too enamored with software wizardry, like its new three-dimensional map service, and failed to make a search engine people liked to use.

A lot of decisions were driven by technology; they were not driven by the consumer,” he said. “It isn’t always the best technology that wins. It is the best experience.”
...
Mr. Berkowitz does not defend the brand choice he inherited.

“I don’t know if Live is the right name,” he said, saying he had not decided what to do about it. But before he gets around to deciding whether to change the brand, he wants to make Microsoft’s search engine itself more appealing to consumers.

What he did decide was to keep the MSN name afloat, too, as it is well known and its various services have 430 million users around the world. He promoted Joanne K. Bradford, Microsoft’s head of advertising sales, to oversee and revive the MSN portal.

Definitely some harsh words attributed to our corporate VP which has led some Windows Live watchers to wonder whether the brand is going to be tossed. I'm going to ignore the obvious flame bait of seeing an article claiming that one of our corporate vice presidents criticized what is probably the only best of breed online service we provide (i.e. http://maps.live.com) and just focus on an implicit yet incorrect assumption carried throughout the article. The assumption is that Steve Berkowitz runs Windows Live.

I've commented on our org chart before but here is a refresher course for the reporters and bloggers out there that feel compelled to write about Windows Live and MSN. If you go back to the press release after our last major reorg Microsoft Realigns Platforms & Services Division for Greater Growth and Agility you'll notice that it beaks out Microsoft's internet business into the following three pieces

Windows and Windows Live Group
With Sinofsky in charge, the Windows and Windows Live Group will have engineering teams focused on delivering Windows and engineering teams focused on delivering the Windows Live experiences. Sinofsky will work closely with Microsoft CTO Ray Ozzie and Blake Irving to support Microsoft’s services strategy across the division and company.
Windows Live Platform Group
Blake Irving will lead the newly formed Windows Live Platform Group, which unites a number of MSN teams that have been building platform services and capabilities for Microsoft’s online offerings. This group provides the back-end infrastructure services, platform capabilities and global operational support for services being created in Windows Live, Office Live, and other Microsoft and third-party applications that use the Live platform. This includes the advertising and monetization platforms that support all Live service offerings.
Online Business Group
The new Online Business Group includes advertising sales, business development and marketing for Live Platforms, Windows Live and MSN — including MSN.com, MSNTV and MSN Internet Access. David Cole, senior vice president, will lead this group until his successor is named before his leave of absence at the end of April. [Dare - Steve Berkowitz is the replacement]

As you can see from the above press release you'll note that Steve Berkowitz owns the sales, marketing and business aspects of Windows Live but not the products themselves. Steven Sinofsky and his subordinates, specifically Chris Jones and Christopher Payne, are responsible for Windows Live. Although Steve Berkowitz is probably the right guy to talk to about the marketing and branding of Windows Live, he probably isn't the right person to talk to about the future of Windows Live products like search (holla at Christopher Payne) or email/IM/blogging (talk to Chris Jones).

I find it interesting to see articles like NY Times: Will Berkowitz keep Windows Live? because I think although things are confusing now with two poorly differentiated and overlapping brands, it would send out the wrong signal to the the market, our competitors and our customers if we decided to go back to the MSN brand for all our online services. What do you think? 


 

Categories: MSN | Windows Live