Working on RSS Bandit is my hobby and sometimes I retreat to it when I need to unwind from the details of work or just need a distraction. This morning was one of such moments. I decided to look into the issue raised in the thread from our forums entitled MSN Spaces RSS Feeds Issues - More Info where some of our users complained about a cookie parsing error when subscribed to feeds from MSN Spaces.

Before I explain what the problem is, I'd like to show an example of what an HTTP cookie header looks like from the Wikipedia entry for HTTP cookie

Set-Cookie: RMID=732423sdfs73242; expires=Fri, 31-Dec-2010 23:59:59 GMT; path=/; domain=.usatoday.com

Note the use of a semicolon as a delimiter for separating cookies. So it turned out that the error was in the following highlighted line of code


if (cookieHeaders.Length > 0) {
container.SetCookies(url, cookieHeaders.Replace(";", ","));
}

You'll note that we replace the semicolon delimiters with commas. Why would we do such a strange thing when the example above shows that cookies can contain commas? It's because the CookieContainer.SetCookies method in the .NET Framework requires the delimiters to be commas. WTF ?

This seems so fundamentally broken I feel that I must be mistaken. I've tried searching for possible solutions to the problem online but I couldn't find anyone else who has had this problem. Am I using the API incorrectly? Am I supposed to parse the cookie by hand before feeding it to the method? If so, why would anyone design the API in such a brain damaged manner?

*sigh*

I was having more fun drafting my specs for work.

Update: Mike Dimmick has pointed out in a comment below that my understanding of cookie syntax is incorrect. The cookie shown in the Wikipedia example is one cookie not four as I thought. It looks like simply grabbing sample code from blogs may not have been a good idea.:) This means that I may have been getting malformed cookies when fetching the MSN Spaces RSS feeds after all. Now if only I can repro the problem...


 

Wednesday, April 12, 2006 5:14:31 PM (GMT Daylight Time, UTC+01:00)
i hate it when stuff like that happens.

does RSS Bandit work under Linux using Mono?
warren
Wednesday, April 12, 2006 5:26:35 PM (GMT Daylight Time, UTC+01:00)
This was extracted from a cookie article

NAME=VALUE
This string is a sequence of characters excluding semi-colon, comma and white space. If there is a need to place such data in the name or value, some encoding method such as URL style %XX encoding is recommended, though no encoding is defined or required.

So any comma's in the cookie should be escaped
Victor Edmondson
Wednesday, April 12, 2006 6:36:11 PM (GMT Daylight Time, UTC+01:00)
And the plot thickens - Note this from the MSDN docs for the CookieContainer.GetCookieHeader method: "Note that the exact format of the string depends on the RFC that the Cookie conforms to. The strings for all the Cookie instances that are associated with uri are combined and delimited by semicolons.
This string is not in the correct format for use as the second parameter of the SetCookies method."

...So it's not even as if it was *assumed* that the cookie header would be delimited by commas. Given that the cookie header existed before the GetCookieHeader and SetCookies methods, a naive observer might wonder why the latter weren't simply designed to work with it. Perhaps the remark about the exact format of the string being RFC-dependant implies that the headers are too inconsistent to reliably use programmatically as a generic source of cookies for SetCookies.
Wednesday, April 12, 2006 11:57:36 PM (GMT Daylight Time, UTC+01:00)
RFC 2109 and 2965 state that multiple cookies can be set in a single Set-Cookie or Set-Cookie2 header, and that the cookies are comma-separated. The comma is _not_ a valid character in a cookie without being escaped.

Looking through the source of System.Net.CookieTokenizer.Next, it looks like the parser has a specific exemption for a comma appearing in the date format of the Expires section of the token.

In the example you give the semicolon is not separating cookies but separating cookie attributes, which is a different thing: it does not set four cookies RMID, expires, path and domain, but a single cookie RMID with the given attributes. You should not replace these semicolons with commas.
Thursday, April 13, 2006 3:22:02 AM (GMT Daylight Time, UTC+01:00)
Mike,
It seems I may have been dealing with malformed cookies from MSN Spaces after all. Thanks for the explanation.
Comments are closed.