A couple of weeks ago I got a bug filed against me in RSS Bandit with the title Weak duplicate feed detection that had the following description

I already subscribed to "http://feeds.haacked.com/haacked" feed. Then while browsing the feed's homepage the feed autodetection gets refreshed with a "new feed found", url: "http://feeds.haacked.com/haacked/".  It is not detected as a duplicate (ends with backslash) there and also not detected in the subscription wizard.

Even though I argued that there were lots of URLs that seem equivalent to end users that aren't according to the specs I decided to go ahead and fix the two common types of equivalence that trip up end users

Within a few hours of shipping this in version 1.6.0.2 of RSS Bandit, the bug reports have started coming in hard. There's a thread in our forums with the title URL Corruption After Adding a Feed with the following complaints

After I add the feed at: http://www.simple-talk.com/feed/
I get an error in the Error Log and when I restart RSS Bandit, the URL has been truncated to http://www.simple-talk.com

Possibly related problem:
When I add "
http://www.amazonsellercommunity.com/forums/rss/rssmessages.jspa?forumID=22" then RSS Bandit slices and dices it into "http://amazonsellercommunity.com/forums/rss/rssmessages.jspa?forumID=22".
Sorry about the period outside the quotation mark. I wanted to make sure the URL was clear.

Same problem--have a feed, I've used it in the past with RSSBandit and was trying to enter it (tried the usual way first, then turned off autodiscover, then tried to change it via properties) but no matter what I do the www. in front disappears, and the feed doesn't work. http://www.antipope.org/charlie/blog-static/

Each of these is an example where the URL works when the domain is starts with "www" but doesn't if you take it out. This is definitely a case of from bad to worse. We went from the minor irritation of duplicate feeds not being detected when you subscribe to the same feed twice to users being unable to access certain feeds at all.

My apologies to everyone affected by this problem. I will be dialing back the canonicalization process to only treat trailing slashes as equivalent. Expect the installer to be refreshed within the next hour or so.

Now Playing: Young Jeezy feat. Swizz Beatz - Money In Da Bank (Remix)


 

Monday, 11 February 2008 15:42:39 (GMT Standard Time, UTC+00:00)
Slightly confused. Wasn't the original request to detect duplicate and not to edit URLs?
pwb
Monday, 11 February 2008 16:39:03 (GMT Standard Time, UTC+00:00)
URL comparisons happen in a lot of places in the code base. It was easier to store and represent all URLs in a canonicalized format than pick and choose different places in the code to use different comparison rules for determining URI equivalence.
Monday, 11 February 2008 16:47:52 (GMT Standard Time, UTC+00:00)
Wouldn't canonizing "example.com" -> "www.example.com" (instead of the other way around) solve the second problem? And maybe a short "Ping" of the canonized URL (to make shure it exists) would be a good idea?

Regards,
tamberg
Monday, 11 February 2008 16:50:42 (GMT Standard Time, UTC+00:00)
("canonized" should be "canonicalized")
Monday, 11 February 2008 17:07:18 (GMT Standard Time, UTC+00:00)
tamberg,
The problem exists both ways. There is no guarantee that a web site will respond if you append "www." to the URL. For example, try going to http://www.del.icio.us instead of http://del.icio.us. Secondly just because a server doesn't respond to a ping doesn't mean it is an invalid URL since there are all sorts of legitimate reasons for a server not to respond temporarily.

I knew it was a bad assumption to make in the when I made the change but decided to do it because I didn't want adherence to RFCs to trump user experience. Next time I won't second guess my own decisions on such topics.
Tuesday, 12 February 2008 00:58:23 (GMT Standard Time, UTC+00:00)
Maybe this is explained somewhere but I'll just pose the question here, what value it adds to have a site www.something.com instead just something.com? Is there some technical reasoning to this? And is there some standard that states this is the recommended or some such?

To me personally the HTTP:// signifies that I should probably open it in the browser instead of some other app. www seems kind of redundant then?
ac
Comments are closed.