September 29, 2004
@ 08:33 AM

The Bloglines press release entitled New Bloglines Web Services Selected by FeedDemon, NetNewsWire and Blogbot to Eliminate RSS Bandwidth Bottleneck has this interesting bit of news

Redwood City, Calif.--September 28, 2004 -- Three leading desktop news feed and blog aggregators announced today that they have implemented new open application programming interfaces (API) and Web Services from Bloglines (www.bloglines.com) that connect their applications to Bloglines' free online service for searching, subscribing, publishing and sharing news feeds, blogs and rich web content. FeedDemon (www.bradsoft.com), NetNewsWire (www.ranchero.com), and Blogbot (www.blogbot.com) are the first desktop software applications to use the open Bloglines Web Services.

Bloglines Web Services address a key issue facing the growing RSS market by reducing the bandwidth demands on sites serving syndicated news feeds. Now, instead of thousands of individual desktop PCs independently scanning news sources, blogs and web sites for updated feeds, Bloglines will make low-bandwidth requests to each site on behalf of the universe of subscribers and cache any updates to its master web database. Bloglines will then redistribute the latest content to all the individuals subscribed to those feeds via the linked desktop applications -- FeedDemon, NetNewsWire or Blogbot -- or via Bloglines' free web service.
...
Bloglines Web Services Enable Synchronization for Desktop News Aggregators "Our customers have been looking for the ability to synchronize their feed subscriptions across multiple computers," said Nick Bradbury, founder of Bradbury Software and creator of FeedDemon, the leading RSS aggregator for Windows. "By partnering with Bloglines, we are now able to offer the rich desktop functionality FeedDemon customers have come to expect, with the flexible mobility and portability of a web service."

There are two aspects of this press release I'm skeptical about. The first is that having desktop aggregators fetch feeds from Bloglines versus the original sources of the feeds somehow "eliminates the RSS bandwidth bottleneck". It seems to me that the Bloglines proposal does the opposite. Instead of thousands of desktop aggregators fetching tens of thousands to hundreds of thousands of feeds from as many websites instead it is proposed that they all ping the Bloglines server. This seems to be creating a bottleneck to me, not the other way around.

The second aspect of the proposal is that I call into question is the Bloglines Sync API. The information on this API is quite straightforward

The Bloglines Sync API is used to access subscription information and to retrieve blog entries. The API currently consists of the following functions:

  • listsubs - The listsubs function is used to retrieve subscription information for a given Bloglines account.
  • getitems - The getitems function is used to retrieve blog entries for a given subscription.

All calls use HTTP Basic authentication. The username is the email address of the Bloglines account, and the password is the same password used to access the account through the Bloglines web site.

I was interested in using this API to round out the existing feed synchronization support in RSS Bandit. In current versions a user can designate a file share, WebDAV server or FTP server as the central location for synchronizing multiple instances of RSS Bandit. I investigated what it would take to add Bloglines as a fourth synchronization point after reading the aforementioned press release and came to the conclusion that the API provided by Bloglines falls short of providing the functionality that exists in RSS Bandit today with the other synchronization sources.  

The problems with the Bloglines Sync API include

  1. The Bloglines Sync API only allows clients to retrieve the subscribed feeds. The user has to login to the Bloglines site to perform feed management tasks like adding, deleting or modifying the feeds to which they they are subscribed.
  2. No granular mechanism to get or set the read/unread state of the items in the users feed list. 

These limitations don't make using the Bloglines Sync API a terribly useful way for synchronizing between two desktop aggregators. Instead, it primarily acts as a way for Bloglines to use various desktop aggregators as a UI for viewing a user's Bloglines subscriptions without the Bloglines team having to build a rich client application.

Thanks, but I think I'm going to pass.


 

September 27, 2004
@ 07:28 AM

Today my TiVo dissapointed me for what I feel is the last time. Yesterday I set it to record 4 never before aired episodes of Samurai Jack. Sometime between 6:30 PM this evening and 11 PM the TiVo decided that recording suggestions was more important than keeping around one of the episodes of Samurai Jack.

For the past few months I've been disappointed with the TiVo's understanding of priority when keeping recordings. For example, it shouldn't delete a first run episode over a rerun especially when the Season Pass for the deleted episode is set to record only first runs.  This is a pretty basic rule I'm sure I could write myself if I had access to the TiVo source code. This last mistake is the straw that has broken the camel's back and I'm now seeking a replacement to TiVo.

I now realize I prefer an Open Source solution so I can hack it myself. Perhaps I should take a look at MythTV.


 

Categories: Technology

September 26, 2004
@ 07:10 PM

As an author of a news reader that supports RSS and Atom, I often have to deal with feeds designed by the class of people Mark Pilgrim described in his post Why specs matter as assholes. These are people who

read specs with a fine-toothed comb, looking for loopholes, oversights, or simple typos.  Then they write code that is meticulously spec-compliant, but useless.  If someone yells at them for writing useless software, they smugly point to the sentence in the spec that clearly spells out how their horribly broken software is technically correct

This is the first in a series of posts highlighting such feeds as an example to others on how not to design syndication feeds for a website. Feeds in these series will often be technically valid RSS/Atom feeds but for one or more reasons cause unnecessary inconvenience to authors and users of news aggregators.

This week's gem is the Cafe con Leche RSS feed. Instead of pointing out what is wrong with this feed myself I'll let the author of the feed do so himself. On September 24th Elliotte Rusty Harold wrote

I've been spending a lot of time reviewing RSS readers lately, and overall they're a pretty poor lot. Latest example. Yesterday's Cafe con Leche feed contained this completely legal title element:

<title>I'm very pleased to announce the publication of XML in a Nutshell, 3rd edition by myself and W.
          Scott Means, soon to be arriving at a fine bookseller near you.
          </title>

Note the line break in the middle of the title content. This confused at least two RSS readers even though there's nothing wrong with it according to the RSS 0.92 spec. Other features from my RSS feeds that have caused problems in the past include long titles, a single URL that points to several stories, and not including more than one day's worth of news in a feed.

Elliote is technically right, none of the RSS specs says that the <link> element in an RSS feed should be unique for each item so he can reuse the same link for multiple items and still have a valid RSS feed.  So why does this cause problems for RSS aggregators?

Consider the following RSS feed

<rss version="0.92">
  <channel>
    <title>Example RSS feed</title>
    <link>http://www.example.com</link>
    <description>This feed contains an example of how not to design an RSS feed</description>  
    <item>
      <title>I am item 1</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
     <item>
      <title>I am item 2</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
  </channel>
</rss>

Now consider the same feed fetched a few hours later

<rss version="0.92">
  <channel>
    <title>Example RSS feed</title>
    <link>http://www.example.com</link>
    <description>This feed contains an example of how not to design an RSS feed</description>  
    <item>
      <title>I am item one</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
     <item>
      <title>I am item 3</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
      <title>I am item 2</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
   </channel>
</rss>

Now how does the RSS aggregator tell whether the item with the title "I am item 1" is the same as the one named "I am item one" with a typo in the title fixed or a different one?  The simple answer is that it can't. A naive hack is to look at the content of the <description> element to see if it is the same but what happens when a typo was fixed or some update to the content of the <description>?

Every RSS aggregator has some sort of hack to deal with this problem. I describe them as hacks because there is no way that an aggregator can 100% accurately determine when items with the same link and no guid are the same item with content changed or different items. This means the behavior of different aggregators with feeds such as the Cafe con Leche RSS feed is extremely inconsistent.

A solution to this problem is for Elliotte Rusty Harrold to upgrade his RSS feed to RSS 2.0 and use guid elements to distinctly identify items.


 

September 26, 2004
@ 05:42 PM

As I mentioned in my post News Aggregators As Denial of Service Clients (part 2) 

the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.

It also turned out that the newest version of dasBlog also stopped supporting HTTP Conditional GET to category specific feeds when I upgraded from 1.5 to 1.6. This meant I was wasting a huge amount of bandwidth since thousands of RSS Bandit users are subscribed to the feed for my RSS Bandit category.

I decided to download the dasBlog source code and patch my local instance. As I expected it took longer to figure out how to configure ASP.NET and Visual Studio to allow me to compile my own blog software than it did to fix the problem. I guess that's a testament to how well the dasBlog code is written.

Mad props go out to Omar, Clemens and the rest of the dasBlog crew.


 

Categories: Ramblings

In her post Blog Activity Julia Lerman writes

There must be a few people who have their aggregators set to check rss feeds every 10 seconds or something. I very rarely look at my stats because they don't really tell me much. But I have to say I was a little surprised to see that there were over 14,000 hits to my website today (from 12am to almost 5pm).

So where do they come from?

10,000+ are from NewzCrawler then a whole lot of other aggregators and then a small # of browsers. 

This problem is due to the phenomenon originally pointed out by Phil Ringnalda in his post What's Newzcrawler Doing? and expounded on by me in my post News Aggregators As Denial of Service Clients. Basically 

According to the answer on the NewzCrawler support forums when NewzCrawler updates the channel supporting wfw:commentRss it first updates the main feed and then it updates comment feeds. Repeatedly downloading the RSS feed for the comments to each entry in my blog when the user hasn't requested them is unnecessary and quite frankly wasteful.  

Recently I upgraded my web server to using Windows 2003 Server due to having problems with a limitation on the number of outgoing connections using Windows XP. Recently I noticed that my web server was still getting overloaded with requests during hours of peak traffic. Checking my server logs I found out that another aggregator, Sauce Reader, has joined Newzcrawler in its extremely rude bandwidth hogging behavior. This is compounded by the fact that the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.

I'm really irritated at this behavior and have considered banning Sauce Reader & Newzcrawler from fetching RSS feeds on my blog due to the fact that they significantly contribute to bringing down my site on weekday mornings when people first fire up their aggregators at work or at home.  Instead, I'll probably end up patching my local install of dasBlog to support HTTP conditional GET for comments feeds when I get some free time. In the meantime I've tweaked some options in IIS that should reduce the amount of times the server is inaccessible due to being flooded with HTTP requests.

This doesn't mean I think this feature of the aforementioned aggregators is something that should be encouraged. I just don't want to punish readers of my blog because of decisions made by the authors of their news reading software.


 

In a post entitled When will Scoble earn his Longhorn pay? Robert Scoble writes

The thing is that I don't have any credibility left when it comes to Longhorn. Over the last 18 months I got out there and lead lots of Longhorn cheers. And now there's a changing of direction.

Tons of people, both inside and outside of Microsoft, have been talking with me about where we're going now. I've met in the past week with the Avalon and WinFS teams (yes, they both still exist).

The thing is, I am super sensitive right now to making a whole new round of promises. I'd rather wait to talk until there's beta build to hand you. Why? Cause what good does it do to write about the feature set if you can't see it? And if you're a developer, you don't want to hear FUD, you wanna see working APIs.

Shortly after Robert joined Microsoft I sent him a link to Joel Spolsky's Mouth Wide Shut article because I thought he was going overboard in pimping Longhorn. Experience working with product teams at Microsoft had already thought me that until a technology is in beta almost everything about it can change. For example, the plans my team had for what we were shipping in Whidbey two years ago are very different from what we planned to ship a year ago which is very different plan to ship today. Features gets cut all the time, priorities change and then there's the date driven release dance.

Microsoft has always had a credibility problem due to what people have termed vaporware announcements. Although many have assumed that the company does this maliciously the truth of the matter is that a lot of these incidents are product teams prematurely announcing their plans to the world. Personally I think Microsoft's evangelists and marketing folks could do the company, our customers and the software industry in general a service by shutting the hell up about future product plans until they were more than a glimmer in some software architect's eye.

Borrowing a leaf from Apple doesn't sound so bad right about now.


 

Categories: Life in the B0rg Cube

My article Improving XML Document Validation with Schematron is finally up on MSDN. It provides a brief introduction to Schematron, shows how to embed Schematron assertions in a W3C XML Schema document for improved validation capabilities and how to get the power of Schematron in the .NET Framework today. The introduction of the article is excerpted below

Currently the most popular XML schema language is the W3C XML Schema Definition language (XSD). Although XSD is capable of satisfying scenarios involving type annotated infosets it is fairly limited when it comes to describing constraints on the structure of an XML document. There are many examples of situations where common idioms in XML vocabulary design are impossible to express using the constraints available in W3C XML Schema. The three most commonly requested constraints that are incapable of being described by W3C XML Schema are:

  1. The ability to specify a choice of attributes. For example, a server-status element should either have a server-uptime attribute or a server-downtime attribute.

  2. The ability to group elements and attributes into model groups. Although one can group elements using compositors such as xs:sequence, xs:choice, and xs:all, the same cannot be done with both elements and attributes. For example, one cannot create a choice between one set of elements and attributes and another.

  3. The ability to vary the content model based on the value of an element or attribute. For example, if the value of the status attribute is "available" then the element should have an uptime child element; otherwise it should have a downtime child element. The technical name for such constraints is co-occurrence constraints.

Although these idioms are widely used in XML vocabularies it isn't possible to describe them using W3C XML Schema, which makes it difficult to rely on schema validation for enforcing the message contract. This article describes how to layer such functionality on top of the W3C XML Schema language using Schematron.

Embedding Schematron assertions in a W3C XML Schema document allows you to have your cake and eat it to.


 

Categories: XML

September 19, 2004
@ 10:14 PM

Tim Bray has another rant on the prolifieration of WS-* specs in the XML Web Services world. In his post The Loyal WS Opposition he writes

I Still Dont Buy It No matter how hard I try, I still think the WS-* stack is bloated, opaque, and insanely complex. I think its going to be hard to understand, hard to implement, hard to interoperate, and hard to secure.

I look at Google and Amazon and EBay and Salesforce and see them doing tens of millions of transactions a day involving pumping XML back and forth over HTTP, and I cant help noticing that they dont seem to need much WS-apparatus.

One way to view the various WS-* specifications is that they are akin to Java Specification Requests (JSRs) in the Java world. A JSR is basically a way for various Java vendors to standardize on a mechanism for solving a particular customer problem. Usually this mechanism takes the form of an Application Programming Interface (API). Some JSRs are widely adopted and have become an integral aspect of programming on the Java platform (e.g. the JAXP JSR). Some JSRs are pushed by certain vendors while being ignored by others leading to overlap (e.g. the JDO JSR which was voted against by BEA, IBM and Oracle but supported by Macromedia and Sun). Then there's Enterprise Java Beans which is generally decried as a bloated and unnecessarily complex solution to business problems. Again that was the product of the JSR process.

The various WS-* specs are following the same pattern as JSRs which isn't much of a surprise since a number of the players are the same (e.g. Sun & IBM). Just like Tim Bray points out that one can be productive without adopting any of the WS-* family of specifications it is similarly true that one can be productive in Java without relying on the products of JSRs but instead  rolling one's own solutions. However this doesn't mean there aren't benefits of standardizing on the high level mechanisms for solving various business problems besides saying "We use XML and HTTP so we should interop".   

Omri Gazitt, the Produce Unit Manager of the Advanced XML Web Services team has a post on WS-Transfer and WS-Enumeration which should hit close to home for Tim Bray since he is the co-chair of the Atom working group

WS-Transfer is a spec that Don has wanted to publish for a year now.  It codifies the simple CRUD pattern for Web services (the operations are named after their HTTP equivalents - GET, PUT, DELETE, and there is also a CREATE pattern.  The pattern of manipulating resources using these simple verbs is quite prevalent (Roy Fielding's REST is the most common moniker for it), and of course it underlies the HTTP protocol.  Of course, you could implement this pattern before WS-Transfer, but it does help to write this down so people can do this over SOAP in a consistent way.  One interesting existing application of this pattern is Atom (a publishing/blogging protocol built on top of SOAP).  Looking at the Atom WSDL, it looks very much like WS-Transfer - a GET, PUT, DELETE, and POST (which is the CREATE verb specific to this application).  So Atom could easily be built on top of WS-Transfer.  What would be the advantage of that?  The same advantage that comes with any kind of consistent application of a technology - the more the consistent pattern is applied, the more value it accrues.  Just the value of baking that pattern into various toolsets (e.g. VS.NET) makes it attractive to use the pattern. 

I personally think WS-Transfer is very interesting because it allows SOAP based applications to model themselves as REST Web Services and get explicit support for this methodolgy from toolkits. I talked about WS-Transfer with Don a few months ago and I've had to bite my tongue for a while whenever I hear people complain that SOAP and SOAP based toolkits don't encourage building RESTful XML Web Services.

I'm not as impressed with WS-Enumeration but I find it interesting that it also covers another use case of the ATOM API which is a mechanism for pulling down the content archive  from a weblog or similar system in a sequential manner. 


 

Categories: Technology | XML

September 19, 2004
@ 08:44 PM

I saw Ghost in the Shell 2: Innocence last night. I'm not a big fan of some of the critically acclaimed plot driven members of the anime genre such as the original Ghost in the Shell and Akira. I always thought they didn't have enough action nor did they delve deeply enough into the philosophical questions they raised. Thus I have tended to prefer anime that doesn't have any pretension of intellectual depth and was just straight violence fest such as Crying Freeman, Ninja Scroll and most recently Berserk.

Ghost in the Shell 2 struck a happy balance for me. It had excellent action, especially the scene where Batuo visits a local Yakuza hang out with a heavy machine gun. I also thought the exploration of what it means to be truly human in a world where the line between man and machine is continually blurred was better done in Innocence than in the original Ghost in the Shell. It just seemed a lot less heavy handed.

I'm definitely going to pick up the Ghost in the Shell TV series from Fry's later this week

Rating: **** out of *****


 

Categories: Movie Review

September 19, 2004
@ 08:00 PM

Reading a post on Dave Winer blog I caught the following snippet

NY Times survey of spyware and adware. "...a program that creeps onto a computer’s hard drive unannounced, is wrecking the Internet." 

I've noticed that every time I sit at the computer of a non-technical Windows user I end up spending at least an hour removing spyware from their computer. Yesterday, I encountered a particularly nasty piece of work that detected when the system was being scanned by Ad-Aware and forced a system reboot. I'd never realized how inadequate the functionality of the Add or Remove Programs dialog was for removing applications from your computer until spyware came around. After I was done yesterday, I began to suspect that some of the spyware that was polite enough to add an entry in "Add or Remove Programs" simply took the Uninstall instruction as a command to go into stealth mode. One application made you fill out a questionairre before it let you uninstall it. I wondered if it would refuse to uninstall if it didn't like my answers to its questions.

Something definitely has to be done about this crap. In the meantime I suggest using at least two anti-spyware applications if attempting to clean a system. I've already mentioned Ad-Aware, my other recommendation is Spybot - Search & Destroy.


 

Categories: Technology

Reading Gretchen's blog today I saw a post entitled Do you have any questions for me? where she wrote

One of the most mysterious questions of any interview is usually the last one asked … “Do you have any questions for me?”  Wow.  What a loaded question!  What do you say?  Is it a test?  Is it a genuine request for inquiries?  Who knows!

Well, unfortunately, I don’t have a clear-cut answer to this question.  I’ve yet to figure it out myself.  The best advice I can give is that the motive behind the question and the way in which you should respond really varies from interviewer to interviewer and situation to situation.

I've always taken this as an opportunity to figure out if the team I am interviewing with is a good fit for me. Over time, I've come up with a short list of questions which I plan to use if I'm ever interviewing around Microsoft based on my knowledge of how things work here. The 5 topics listed below are the biggest causes of unhappiness and morale issues in teams across Microsoft. Working for teams that don't know their competitors, cut projects due to poor planning, and require people to work insane hours are just a few of the reasons I've seen people become frustrated in positions here.

  1. How do you make decisions such as adding new features, entering a new competitive space or cutting features?

  2. How do you contribute to your product unit's bottomline? If not, are you strategic?

  3. Who are your competitors? Do you have any overlap with other Microsoft products?

  4. Ask about work/life balance. What is the policy on flex time, earliest meeting times, on average what kind of hours people work, etc.

  5. What are my growth prospects on this team?

  6.  What is the worst thing about your job?

Question 6 is a bonus question. The first 5 questions should root out whether the team is an unhealthy work environment or not. The last one is more specific to the individual interviewing you but may give insight into your future manager(s).  Remember to ask everyone who interviews you these questions so you can compare notes.


 

Categories: Life in the B0rg Cube

September 15, 2004
@ 02:52 PM

Yesterday in my post Killing the "WinFS is About Making Search Better" Myth I wrote

Now this isn't to say that there aren't some searches made better by coming up with a consistent way to interact with certain file types and providing structured metadata about these files. For example a search like

Get me all the songs [regardless of file type] either featuring or created by G-Unit or any of its members (Young Buck, 50 Cent, Tony Yayo or Lloyd Banks) between 2002 and 2004 on my hard drive

is made possible with this system. However it is more likely that I want to navigate this in a UI like the iTunes media library than I want to type the equivalent of SQL queries over my file system.

I just found out that I can already do stuff like this today in iTunes. I can create a playlist by querying based on artist, song name, genre, year, rating, album and a lot more. I've been wishing for this functionality ever since I bought an iPod. Apple fucking rocks. I'll never use WinAmp again.


 

Categories: Ramblings

In the comments to my recent blog post RSS Bandit Gets Around Phil Weber asked Dare: In addition to manual deletion, please consider offering an option to "Automatically delete posts older than X days." Flagged posts should not be auto-deleted. Thank you! This functionality already exists in RSS Bandit. You can set it on a per feed level and have a global default setting as well. The user interface for it is misleading which is probably why most people don't know it exists. The fields called "Max Item Age" and "Default Maximum Item Age" should actually be labelled as "Delete Items Older Than" instead. I'll probably change the text in the code after posting this blog entry. Click below to see screenshots of this feature.
 

Categories: RSS Bandit

I recently read an InfoWorld article entitled Gartner: Ignore Longhorn and stick with XP where it states

Microsoft Corp. may choose never to release its vaunted and long-overdue project WinFS, following its removal from the next version of Windows, according to analysts Gartner Inc.
...
Microsoft has said Longhorn will still include local desktop searching as a taste of the power of WinFS' relational database capabilities, but Gartner sees this as a hint that WinFS may never arrive. "Because Microsoft has committed to improving search without WinFS, it may choose never to deliver the delayed WinFS," Gartner said.

The fundamental premise of the above statements is that the purpose of WinFS is to make local desktop search better or to use a cruder term to create "Google for the Desktop". It may be true that when it first started getting pitched one of the scenarios people described was making search better. However as WinFS progressed the primary scenarios its designers focused on enabling didn't have much to do with search. If you read Longhorn evangelist Jeremy Mazner's blog posting entitled What happened to WinFS? posted after the Longhorn changes were announced you'll find the following excerpt

The WinFS team spent a solid couple weeks going through this evaluation.  There are of course plenty of things you could do to increase the confidence level on a project the size of WinFS, since it has so many features, including:

  • Built-in schemas for calendar, contacts, documents, media, etc
  • Extensibility for adding custom schema or business logic
  • File system integration, like promotion/demotion and valid win32 file paths
  • A synchronization runtime for keeping content up to date
  • Rich API support for app scenarios like grouping and filtering
  • A self-tuning management service to keep the system running well
  • Tools for deploying schema, data and applications

The above feature list is missing the recent decision to incorporate the features of the object relational mapping technology codenamed ObjectSpaces into WinFS. Taking all these features together none of them is really focused on making it easier for me to find things on my desktop.

At its core, WinFS was about storing strongly typed objects in the file system instead of opaque blobs of bits. The purpose of doing this was to make accessing and manipulating the content and metadata of these files simpler and more consistent. For example, instead of having to know how to manipulate JPEG, TIFF, GIF and BMP files there would just be a Photo item type that applications would have to deal with. Similarly one could imagine just interacting with a built in Music item instead of programming against MP3, WMA, OGG, AAC, and WAV files. In talking to Mike Deem a few months ago and recently seeing Bill Gates discuss his vision for WinFS to folks in our building a few weeks ago it is clear to me that the major benefits of WinFS to end users is the possibilities it creates in user interfaces for data organization.

Recently I switched from using WinAmp to iTunes on the strength of the music organizational capabilities of the iTunes library and "smart playlists". The strength of iTunes is that it provides a consistent interface to interacting with music files regardless of their underlying type (AAC, MP3, etc) and provides ways to add metadata about these music files (ratings, number of times played) then organize these files according to this metadata. Another application that shows the power of the data organization based on rich, structured metadata is Search Folders in Outlook 2003. When I used to think of WinFS I got excited about being able to perform SQL-like queries over items in the file system. Then I heard Bill Gates and Mike Deem speak about WinFS then saw them getting excited about the ability to take the data organizational capabilities of features like the My Pictures and My Music folders in Windows to the next level it all clicked.

Now this isn't to say that there aren't some searches made better by coming up with a consistent way to interact with certain file types and providing structured metadata about these files. For example a search like

Get me all the songs [regardless of file type] either featuring or created by G-Unit or any of its members (Young Buck, 50 Cent, Tony Yayo or Lloyd Banks) between 2002 and 2004 on my hard drive

is made possible with this system. However it is more likely that I want to navigate this in a UI like the iTunes media library than I want to type the equivalent of SQL queries over my file system.

More importantly, this system doesn't make much easier to find stuff I've lost on my file system like Java code I wrote while in college or drafts of articles created several years ago that I never finished. When I think "Google on the Desktop", that's the problem I want to see solved. However MSN just bought LookOut so I have faith that we will be solving this problem in the near future as well.


 

Categories: Technology

September 14, 2004
@ 08:45 AM

Oleg has just announced a new release of EXSLT, his post is excerpted below

Here we go again - I'm pleased to announce EXSLT.NET 1.1 release. It's ready for download. The blurb goes here:

EXSLT.NET library is community-developed free open-source implementation of the EXSLT extensions to XSLT for the .NET platform. EXSLT.NET fully implements the following EXSLT modules: Dates and Times, Common, Math, Random, Regular Expressions, Sets and Strings. In addition EXSLT.NET library provides proprietary set of useful extension functions.

Download EXSLT.NET 1.1 at the EXSLT.NET Workspace home - http://workspaces.gotdotnet.com/exslt
EXSLT.NET online documentation - http://www.xmland.net/exslt

EXSLT.NET Features:

  • 65 supported EXSLT extension functions
  • 13 proprietary extension functions
  • Support for XSLT multiple output via exsl:document extension element
  • Can be used not only in XSLT, but also in XPath-only environment
  • Thoroughly optimized for speed implementation of set functions

Here is what's new in this release:

  • New EXSLT extension functions has been implemented: str:encode-uri(), str:decode-uri(), random:random-sequence().
  • New EXSLT.NET extension functions has been implemented: dyn2:evaluate(), which allows to evaluate a string as an XPath expression, date2:day-name(), date2:day-abbreviation(), date2:month-name() and date2:month-abbreviation() - these functions are culture-aware versions of the appropriate EXSLT functions.
  • Support for time zone in date-time functions has  been implemented.
  • Multithreading issue with ExsltTransform class has been fixed. Now ExsltTransform class is thread-safe for Transform() method calls just like the  System.Xml.Xsl.XslTransform class.
  • Lots of minor bugs has been fixed. See EXSLT.NET bug tracker for more info.
  • We switched to Visual Studio .NET 2003, so building of the project has been greatly simplified.
  • Complete suite of NUnit tests for each extension function has been implemented (ExsltTest project).

The EXSLT.NET project has come quite some way since I started it last year. Oleg has done excellent work with this release. It's always great to see the .NET Open Source community come together this way.


 

Categories: XML

September 14, 2004
@ 08:35 AM

I installed Windows 2003 Server on the machine that this weblog runs on this morning. This should get rid of those pesky "Too Many Users" errors due to connection limits in Windows XP Professional which was the previous OS on the machine. It took me all day to figure out how to give ASP.NET write permissions for my weblog so if you attempted to post a comment in the past 24 hours and got an error message I apologize. Things should be fine now.

 


 

Categories: Ramblings | RSS Bandit

September 12, 2004
@ 02:17 AM

In his post Full text RSS on MSDN gets turned off Robert Scoble writes

Steve Maine: what the hell happened to blogs.msdn.com?

RSS is broken, is what happened. It's not scalable when 10s of thousands of people start subscribing to thousands of separate RSS feeds and start pulling down those feeds every few minutes (default aggregator behavior is to pull down a feed every hour).

Bandwidth usage was growing faster than MSDN's ability to pay for, or keep up with, the bandwidth. Terrabytes of bandwidth were being used up by RSS.

So, they are trying to attack the problem by making the feeds lighter weight. I don't like the solution (I've unsubscribed from almost all weblogs.asp.net feeds because they no longer provide full text) but I understand the issues.

This is becoming a broken record. Every couple of months some web site that hasn't properly prepared for the amount of bandwidth consumed by having a popular RSS feed loudly complains and the usual suspects complain that RSS is broken. This time the culprit is Weblogs @ ASP.NET and their mistake was not providing HTTP compression to clients speaking HTTP 1.0. This meant that they couldn't get the benefits of HTTP compression when talking to popular aggregators like Straw, FeedDemon, SharpReader, NewsGator and RSS Bandit. No wonder their bandwidth usage was so high.

But lets ignore that the site wasn't properly configured enough to utilize all the bandwidth saving capabilities of HTTP. Instead lets assume Weblogs @ ASP.NET had done all the right things but was still getting to much bandwidth being consumed. Mark Nottingham covered this ground in his post The Syndication Sky is Falling!

A few people got together in NYC to talk about Atom going to the W3C this morning. One part of the minutes of this discussion raised my eyebrows a fair amount;

sr: […] Lots of people are saying RSS won’t scale. Somebody is going to say I told you so.
bw: Werner Vogels at Cornell has charted it out. We're at the knee of the curve. I don’t think we have 2 years.
sr: I have had major media people who say, until you solve this, I’m not in.
bw: However good the spec is, unless we deal with the bag issues, it won’t matter. There are fundamental flaws in the current architecture.

Fundamental flaws? Wow, I guess I should remind the folks at Google, Yahoo, CNN and my old colleagues at Akamai that what they’re doing is fundamentally flawed; the Web doesn’t scale, sorry. I guess I’ll also have to tell the people at the Web caching workshops that what they do is futile, and those folks doing Web metrics are wasting their time. What a shame...

Bad Reasons to Change the Web Architecture

But wait, there’s more. "Media people" want to have their cake and eat it too. It’s not good enough that they’re getting an exciting, new and viable (as compared to e-mail) channel to eyeballs; they also have to throw their weight around to reduce their costs with a magic wand. What a horrible reason to foist new protocols, new software, and added complexity upon the world.

The amusing new wrinkle is that every body's favorite leader of the "RSS is broken let's start all over" crowd, Sam Ruby, has decided it is time to replace blogs pinging weblogs.com when they update and using HTTP to fetch RSS feeds. Hopefully, this will be more successful than his previous attempts to replace RSS and the various blogging APIs with Atom. It's been over a year and all we have to show from the creation of Atom is yet another crufty syndication format with the promise of one more incompatible one on the way.

Anyway, the point is that RSS isn't broken. After all, it is just an XML file format. If anything is broken it is using HTTP for fetching RSS feeds. But then again do you see people complaining every time some poor web site suffers the effects of the Slashdot Effect about how HTTP is broken and needs to be replaced? If you are running a popular web site, you will need to spend money to afford the traffic. AOL.com, Ebay.com and Microsoft.com are all serving terrabytes of content each month. If they were serving these with the same budget that I have for serving my website these sites would roll over and die. Does this mean we should replace using web browsers and HTTP for browsing the Web and resort to using BitTorrent for fetching HTML pages? It definitely would reduce the bandwidth costs of sites like AOL.com, Ebay.com and Microsoft.com.

The folks paying for the bandwidth that hosts Weblogs @ ASP.NET (the ASP.NET team not MSDN as Scoble incorrectly surmises) decided they had reached their limits and reduced the content of the feeds. It's basically a non-story. The only point of interest is that if they had announced this with enough warning internally folks wuld have advised them to turn on HTTP compression for HTTP 1.0 clients before resorting to crippling the RSS feeds. Act in haste, repent at leisure.


 

September 10, 2004
@ 05:29 PM

In a post entitled Report From the Intel Community Tim Bray writes

This has nothing to do with a California chip maker. Rather, its about a trip I recently took to a conference called Intelink, where the people gather who run one of the worlds biggest and most interesting intranets; the one that serves the community of U.S. Intelligence professionals
...
I was amused to note that on one of the sub-intranets distinguished by being loaded with particularly ultra-secret stuff, they were offering RSS Bandit for the people to download and use.

That's an awesome endorsement. I'm always surprised by the people I find using RSS Bandit whether it is a bunch of U.S. intelligence professionals or high school girls from Singapore. There's a lot that still needs to be done to make consuming information from syndication feeds a truly optimal experience but RSS Bandit gets closer to what I see as the ideal each day. The big focus for the next release will be making it easier to organize, locate and manage information within the aggregator.

Speaking of positive endorsements, here's one on the memory usage characteristics of RSS Bandit from Wesner Moise in his post NET vs Native Performance

The working set for SharpReader is 30Mb, FeedDemon is 23 Mb, and RSS Bandit is 4 Mb in their initial configuration on my machine. (In comparison, the working set for MS Word and MS Excel are about 18 Mbs.) So, actually in their bare configuration, RSS Bandit is the tightest of them all, even considering that RSS Bandit also uses the .NET runtime. However, the working set of .NET applications have a significantly higher variance than native applications. While RSS Bandit was idle, I watch the working set figures initially progress to 13 MBs, then in an instant fall down to 6.5MB, as it appears a collection has occurred. The working set oscillated in an ever narrowing range (down to a range between <3Mb to 6Mb) that apparently reflected dynamic tuning by CLR. Native applications, in contrast, normally have zero variance in working set during idle.

The contrast between SharpReader and FeedDemon is more a reflection of the difference between a free application written as a hobby and a professionally written commercial application, and less as a indicator of Delphi's inherent performance advantage over C#. Performance issues with NewsGator, an Outlook-based reader, which I believed is managed, are likely due to the very high overhead and poor performance of OLE automation in general.

The biggest performance issues with RSS Bandit are memory usage and slowdown when performing IO intensive operations like loading feed files from disk on startup or downloading lots of feeds for the first time. There are many approaches we've considered for resolving the memory issues. The first thing we will do is the easiest, making it possible for people to delete posts from feeds they are subscribed to. This would lead to less news items being held in memory when the application is running this reducing memory consumption.

I've also considered creating a 'memory lite' mode where some memory intensive features are disabled to reduce the memory usage of the application but the few people I've talked this over with have mentioned that memory usage has not been enough of a problem to forego features.


 

Categories: RSS Bandit

September 10, 2004
@ 04:17 PM

I've been a loyal user of WinAmp for several years. I am a big fan of skins and my favorite is currently MMD3. However I recently got tired of using the file system to navigate my music collection and sought out a change. I'd tried iTunes in the past but was underwhelmed by its lack of skinning functionality.

In the past few weeks I've given iTunes another shot and it is now my favorite player. It has a few elegant features that make it a killer app for organizing your music. The UI for navigating your music selection is straightforward and reminiscent of the iPod's. I also like the the built in playlists like 'Recently Played', 'My Top Rated' and 'Top 25 Most Played'.

The only downsides have been that I've had to update ID3 tags for my MP3s to get the most out of the music library UI and I miss some of the killer visualizations in WinAmp skins like MMD3. However this won't stop me from relegating WinAmp to the back burner and making iTunes my music player of choice.


 

Categories: Ramblings

September 8, 2004
@ 03:23 PM

Roger Costello recently started a discussion thread on the XML-DEV mailing list about the common misconceptions people have about XML document validation and schemas. He recently summarized the discussion thread in his post Fallacies of Validation, version #3. His post begins

The purpose of documenting the below "fallacies" is to identify erroneous common thought that many people have with regards to validation and its role in a system architecture.  Perhaps "assumptions" would be a better term to use than "fallacies".  In any case, the desire of this writeup (which is a compilation of discussions on the xml-dev list) is to provoke new ways of thinking about validation, and reject limiting and static views on validation. 

Fallacies of Validation

1. Fallacy of "THE Schema"

2. Fallacy of Schema Locality

3. Fallacy of Requisite Validation

4. Fallacy of Validation as a Pass/Fail Operation

5. Fallacy of a Universal Validation Language

6. Fallacy of Closed System Validation

7. Fallacy that Validation is Exclusively for Constraint Checking

I mostly agree with the fallacies as described in his post.

Fallacy #1 has been a favorite topic of Tim Ewald over the past year. It isn't necessarily true that there is one canonical schema for an XML vocabulary. Instead the schema for the vocabulary may depend on the context the XML document is being used in. A classic example of this is XHTML which has 3 schemas (DTDs) for a single format.

I consider Fallacy #2 to be more of a common mistake than a fallacy. Many people create validation systems that work in a local environment such as creating specific patterns or structures for addresses or telephone numbers which may work in a local system but break down when used in a global environment like the World Wide Web. This common mistake isn't limited to XML validation but applies to all arenas where user input is validated before being stored or processed

Fallacy #3 is interesting to me because I wonder how often it occurs in the wild. Are there really that many people who believe they have to validate XML documents against a schema?

Fallacy #4 is definitely a good one. However I disagree with the quotes he uses to butress the main point for this fallacy. I especially don't like the fact that he uses a generalization from Rick Jellife about bugs in a few schema validators as a core part of his argument. The important point is that schema validation should not always be viewed as a PASS/FAIL operation and in fact schema languages like W3C XML Schema go out of their way to define how one can view an XML document as being part valid, part invalid.

One size doesn't fit all is the message of Fallacy #5 to which I heartily cheer "Hear! Hear!". I agree 100%. There is no one XML schema language that satisfies every validation scenario.

I don't really understand Fallacy #6 without seeing some examples so I won't comment on it. I'll see if I can dig up the discussion threads about this on XML-DEV later.

Fallacy #7 is another one where I agree with the message but mostly disagree with how he argues the point. All of his examples are all variations of using schemas for constraint checking, they just differ on how the document is processed does after constraint checking is done. To me, the prime example of the fact that schema validation is not just for constraint checking is that many technologies actually using schemas for creating typed XML documents or for translating XML from one domain to another (e.g. Object<->XML, Relational<-> XML),

Everything said, this was a good list. Excellent work from Roger as usual.


 

Categories: XML

Recently I found out that we no longer had office supplies on the floor of the building I work in. Now if you need to grab a pen or get a marker after your last one runs out in the middle of a meeting you need to go upstairs. Folks have given me the impression that this is due to the recent cost cutting drive across the company. At first, I couldn't figure out why disrupting people by making them go to another floor for office supplies would cut costs.

Then it hit me. When faced with having to go to another floor to find office supplies the average geek desk jockey will probably say "forget it" and do without. The immediate saving is less office supplies used. But I suspect this is only phase one of the plan. Most people at MSFT believe that on average 50% - 75% of projects and features an employee works on in his career in the b0rg cube never ship. This is all just wasted cash. The best way to nip this is in the bud by preventing people from being able to write down their ideas or whiteboard different ideas with coworkers thus spreading the meme about new projects or features. The amount of money saved by not investing in new money losing ventures like *** and **** would be immense. It all makes a weird kind of sense now.

Seriously though. I've been reading blog posts like Dangerous Transitions and Dangerous Thoughts which call for Microsoft to start performing targetted layoffs instead of cost cutting with skepticism. When I think of the ways Microsoft spends immense amounts of cash for little return I don't worry about John Doe the tester who files on average less bugs than the other members of his team or Jane Doe the developer who writes buggier code than the rest of her team. I think about things like MSN, XBox, the uncertainty around MBF after purchasing Great Plains for billions, embarking on overambitious attempts to rewrite most of the APIs in Windows in an effort that spans 3 product units, spending years working on ObjectSpaces then canning it because there was potential overlap with WinFS and various other white elephant projects.  

All of the above cost from millions to billions of dollars and they are the result of decisions by folks in middle and upper management. I'm glad that Microsoft has decided not to punish rank and file employees for what are basically missteps by upper management in contravention to the typical way corporate America does business.

Ideally, we'd see our upper management address how they plan to avoid missteps like this in future instead of looking for minor bumps in the stock price and our paychecks by sacrificing some low level employees and coworkers.


 

Categories: Life in the B0rg Cube

September 6, 2004
@ 01:02 AM

According to HipHopGame.com Young Buck To 'Stomp' Out Luda/ T.I. Beef On Debut Album

If you have a mixtape featuring "Stomp," Young Buck's posse cut with T.I. and Ludacris, hold onto it. It's a collector's item. The track as we know it, with Cris and Tip battling each other, isn't going to be included on Buck's upcoming Straight Outta Cashville LP. Instead, a remix is going on the album, with newcomer D-Tay replacing T.I.
...
Buck says he asked 50 Cent to reach out to T.I. for a collaboration for Straight Outta Cashville. 50 obliged, and the track was sent to Atlanta for T.I. to rhyme on. Buck said he was surprised when the song came back with the line "And me getting beat down, that's ludicrous," because he didn't know if was a dis or not.

"I was hearing on the streets that [T.I.] and Luda be having problems with each other, and I know I just did a song with Luda's group about a week or two before," Buck elaborated. "Me and Luda are cool. To be all the way honest, I'd known Luda before I knew T.I., so I couldn't just jump on this record and have them having differences with each other, and then [have Luda] be like, 'Yo, Buck, what's up?' "

Staying diplomatic, Buck talked the situation over with Cris and even played T.I.'s verse for him. Ludacris confirmed that the two had been going back and forth, and he wanted to get on the song and speak his piece.

"I even got at T.I. like, 'Yo, Luda heard this record. He wanna jump on the record,' " Buck explained, "just to make sure all the feelings and everything would stay the same way. And he was like, 'Oh, I'm cool. I'm cool with it.' "

So Ludacris laced "Stomp" with his own battle raps, and the streets have been talking ever since.

T.I. and Cris have apparently now squashed their beef, Buck said, but controversy still surrounds the song. According to Buck, T.I.'s camp requested that Ludacris change his verse before they clear Tip to be on the album. (A T.I. spokesperson had no comment on that.) The G-Unit soldier said Cris has refused.

"Even throughout the song, you don't hear either one talking about killing each other," Buck lamented.

I'm not surprised Ludacris didn't want his rhymes removed. He totally schooled T.I. on that track. It's also a statement as to who is the bigger star that Luda's verses stay but T.I.'s will be removed given the standoff between both rappers. The track is hot, too bad it won't be making it onto the album.

By the way, Young Buck is wrong about them not talking about killing each other though.  T.I.s verse ends with When the choppers hit you bitch, you'll wish you got your ass stomped.


 

If you use RSS Bandit and recently installed .NET Framework 1.1 SP1 you may have noticed that you started getting errors of the form

Refresh feed 'SomeCategory\SomeFeed' failed with error: The underlying connection was closed: The server committed an HTTP protocol violation.

This is due to changes made to the System.Net.HttpWebRequest class to make it more compliant to the HTTP specification. For example, it now errors when fetching the Microsoft Research feeds because the web server returns the Content-Location header as "Content Location" with a space. The fix is straightforward and involves placing the following element as a child of the configuration element within the rssbandit.exe.config file in the C:\Program Files\RssBandit folder.

<system.net>
 <settings>
  <httpWebRequest useUnsafeHeaderParsing="true" />
 </settings>
</system.net>

This is also taken care of by v1.2.0.117 of RSS Bandit. When running it detects whether this option is available and enables it automatically so you don't have to mess around with XML configuration files.


 

Categories: RSS Bandit

In my recent post entitled The MSDN Camp vs. The Raymond Chen Camp I wrote

Our team [and myself directly] has gone through a process of rethinking a number of decisions we made in this light. Up until very recently we were planning to ship the System.Xml.XPath.XPathDocument class as a replacement for the System.Xml.XmlDocument class
...
The problem was that the XPathDocument had a radically different programming model than the XmlDocument meaning that anyone who'd written code using the XmlDocument against our v1.0/v1.1 bits would have to radically rewrite their code to get performance improvements and new features. Additionally any developers migrating to the .NET Framework from native code (MSXML) or from the Java world would already be familiar with the XML DOM API but not the cursor-based model used by the XPathDocument. This was really an untenable situation. For this reason we've reverted the XPathDocument to what it was in v1.1 while new functionality and perf improvements will be made to the XmlDocument. Similarly we will keep the new and improved XPathEditableNavigator XPathNavigator class which will be the API for programming against XML data sources where one wants to abstract away what the underlying store actually is. We've shown the power of this model with examples such as the ObjectXPathNavigator and the DataSetNavigator.

I've seen some concerned statements about this posts from XML developers who use System.Xml such as Oleg Tkachenko, Fumiaki Yoshimatsu and Tomas Restrepo so it seems I should clarify some of the decisions we made and why we made them.

In version 1.0 of the .NET Framework we provided two primary classes for interacting with XML; the XmlDocument and XmlReader. The XmlReader provided an abstract interface for interacting with a stream of XML. One can create an XmlReader over textual XML using the XmlTextReader or over virtual XML data sources such as is done with the XmlCsvReader. On the other hand with the XmlDocument we decided to eschew the approach favored by the Java world which used interfaces. Instead we created a single concrete implementation. This turned out to be a bad idea. It tied the the interface for programming against XML in a random access manner with a concrete implementation of an XML store. This made it difficult for developers who wanted to expose their data sources as XML stores and led to inefficient solutions such as the XmlDataDocument.

To rectify this we needed to separate the programming model for accessing XML data sources from our concrete implementation of the XmlDocument. We chose to do this by extending the cursor based programming model we introduced in v1 with the XPathNavigator instead of moving to an interface based approach with XmlDocument. The reason for choosing to go with a cursor based model over a tree based model is summed up in this quote from my article Can One Size Fit All?

In A Survey of APIs and Techniques for Processing XML, I pointed out that cursor-model APIs could be used to traverse in-memory XML documents just as well as tree-model APIs. Cursor-model APIs have an added advantage over tree-model APIs in that an XML cursor need not require the heavyweight interface of a traditional tree-model API where every significant token in the underlying XML must map to an object.

So in Whidbey, the XPathNavigator will be the programming model for working with XML data sources when one wants to abstract away from the underlying source. The XPathNavigator will be changed from the v1.0 model in the following ways (i) it will be editable and (ii) it will expose the post schema validation infoset. I've already worked with Krzysztof Cwalina on updating the Design Guidelines for Exposing XML data in WinFX to account for this change in affairs.

As for the XPathDocument, it is what it always has been. A class optimized for use in XPath and XSLT. If you need 10% - 25% better perf [depending on your scenario] when running XPath over an XML document or running XSLT over in-memory XML then this class should be preferred to the XmlDocument.   


 

Categories: Life in the B0rg Cube | XML

September 3, 2004
@ 03:54 AM

Joe Beda is leaving his position as a dev lead on Avalon to go work at Google. I learned about this a couple of days ago but was waiting for him to blog the news first. I suspect there'll be a lot of attrition across various product teams across Microsoft given the recent news about Longhorn.


 

Categories: Life in the B0rg Cube

Yesterday I installed .NET Framework v1.1 service pack 1 and it messed up my ASP.NET permissions. I decided to use this opportunity to kill two birds with one stone. My weblog is currently hosted on my Windows XP machine using IIS meaning that there are several limitations on the web server. The limitation on number of connections means several times during the day people get "Too Many Users" errors when connecting to this website.

I decided to install Apache and try out Movable Type 3.1. That led to a wasted morning trying to install various Perl modules. I tried some more when I got back from work and eventually gave up. Torsten gave me some tips this morning which fixed my ASP.NET permissions and my weblog is back up.

In the mean time it turns out that the v1.2.0.114 SP1 installer for RSS Bandit turns out to have had a number of  issues. If you're an RSS Bandit user please upgrade to v1.2.0.117.


 

Categories: Ramblings | RSS Bandit