May 4, 2003
@ 12:58 AM
So I decided to spend this morning implementing some new features for the underlying RSS processing bits that RSS Bandit uses. The first thing I did was add HTTP compression support since folks like Scott Watermasysk have complained about about the bandwidth load from aggregators that fail to request compressed files. This was fairly straightforward since all I had to do was use #ziplib.

The second thing I wanted to do was support HTTP 301 responses which indicate that the RSS feed has permanently moved to a new location. Mark Pilgrim complained about serving 4000 permanent redirects a day which shouldn't happen if aggregators updated the feed URL once they got a 301. Implementing this feature was not as straightforward as I thought and is still not complete. The first problem came about because RSS Bandit uses the feed URL to uniquely identify nodes in its tree view. Now RSS Bandit has to deal with the fact that these unique identifiers can change any time the user makes a request for the feed. This is made trickier by the fact that requesting a feed is done via an asynchronous call so as not to tie up the GUI yet this async call may change fundamental aspects of the data structures the GUI relies on. Getting it to work correctly is not a big deal in the long run but it was not as straightforward as I expected.

The bigger problem comes with trying to actually process 301 responses directly. The HttpWebRequest class has an AllowAutoRedirect property which automatically follows redirects but means that the only way one can tell if a redirect occured is with the following code
bool hasChanged = (req.RequestUri != req.Address);
Now the problem with this code is that you can't tell what type of redirect it was. If it was a 301 then I need to update the URL for the feed so I don't send redundant HTTP requests later on. However the server could have sent any number of other redirects such as a 307 which may redirect you to an error page in case some internal error occured (ignoring the fact that a properly configured web server should send a 500, tell that to the .NET Weblogs folks) in which case the feed URL shouldn't be updated.

So my only alternative is to turn of redirect support and catch the WebException that is thrown when the HttpWebRequest class gets a 3xx response, inspect its Response property, then repeat the request after deciding whether I should update the feed URL. Of course, this also has to take into account that the web server may send multiple redirects. Am I the only one that thinks that such "normal program flow" code has no business being in a catch block?

*sigh*
 

Categories: