Wednesday, 21 July 2004 - Dare Obasanjo's weblog

July 21, 2004

@ 04:45 PM

Changing the Web Browser Used by RSS Bandit

During the most recent Download.Ject Internet Explorer incident [which was significant enough I saw newspaper headlines and TV news reports advicing people to switch browsers] I got some requests from RSS Bandit users to switch the browser used by RSS Bandit since they'd switched from using Internet Explorer due to security concerns.

Torsten and I looked around to see how feasible this would be and found the Mozilla ActiveX control which enables one to embed the Mozilla browser engine (Gecko) into any ActiveX application. The control implements the same APIs as the Internet Explorer control so it may be straightforward to make this change.

I have some concerns about doing this.

We've had weird interactions with COM interop between RSS Bandit and IE which result in weird bugs like dozens of IE windows being spawned and most recently memory corruption errors. I am wary of moving to an unknown quantity like Gecko and facing similar issues without the benefit of having a background of working with the component.
There's a question of whether we replace our dependency on IE or ship an option to use Gecko instead of IE. Or whether we just ship a Gecko version and an IE version. The installer for the Mozilla ActiveX control is currently larger than the RSS Bandit download so we'd more than double the size of our download if we tied ourselves to Gecko.

I'm curious as to what RSS Bandit users think. Currently I don't think I'm going to add making such a switch to our plans but I am always interested in feedback from our users on what they think the right thing to do should be.

Categories: RSS Bandit

July 21, 2004

@ 07:15 AM

Comments [6]

The RSS Traffic and DDoS Meme

It seems every 3 months some prominent online publication complains about the amount of traffic RSS news readers cause to websites that provide RSS feeds. This time it is Slashdot with their post When RSS Traffic Looks Like a DDoS which references a post by Chad Dickerson, the CTO of Infoworld, entitled RSS growing pains. Chad writes

Several months ago, I spoke to a Web architect at a large media site and asked why his site didn’t support RSS. He raised the concern that thousands (or even millions) of dumb clients could wreak havoc on a popular Web site. Back when I was at CNN.com, I recall that our servers got needlessly pounded by a dumb client (IE4) requesting RSS-like CDF files at frequent intervals regardless of whether they had changed. As the popularity of RSS feeds at InfoWorld started to surge, I began to notice that most of the RSS clients out there requested and downloaded our feeds regardless of whether the feeds themselves had changed. At the time, we hadn’t quite reached the RSS tipping point, so I filed these thoughts away for later -- but “later” came sooner than I thought.

At this point I'd like to note that HTTP provides two mechanisms for web servers to tell clients if a network resource has changed or not. The basics of this mechanism is explained in the blog post HTTP Conditional Get for RSS Hackers which provides a way to prevent clients such as news readers from repeatedly downloading a Web document if it hasn't been updated. At this point I'd like to point out that at the current time, the InfoWorld RSS feed supports neither.

Another technique for reducing bandwidth consumption by HTTP clients is to use HTTP compression which greatly reduces the amount of data that has to be sent to a client when the feed has to be downloaded. For example, the current InfoWorld feed is 7427 bytes which shrinks to 2551 bytes when zipped using GZip on my home machine. This is a reduction by a factor of 3, on larger files the ratio of the reduced size to the original size is even better. Again, InfoWorld doesn't support this technique for reducing bandwidth consumption.

It is unsurprising that they are seeing significant bandwidth consumption from news aggregators. An RSS reader polling the InfoWorld site once an hour over an 8 hour period would download about 60 kilobytes of XML, on the other hand if they supported HTTP conditional GET requests and HTTP compression via GZip encoding this number would be under 3 kilobytes.

The one thing that HTTP doesn't provide is a way for clients to deal with numerous connections being made to the site at once. However this problem isn't much different from the traditional scaling problem that web sites have to deal with today when they get a lot of traffic from regular readers.

Categories: Syndication Technology

July 21, 2004

@ 06:48 AM

Comments [3]

When Backwards Compatibility Mode Isn't

Today Arpan (the PM for XML query technologies in the .NET Framework) and I were talking about features we'd like to see on our 'nice to have' list for the Orcas release of the .NET Framework. One of the things we thought would be really nice to see in the System.Xml namespace was XPath 2.0. Then Derek being the universal pessimist pointed out that we already have APIs that support XPath 1.0 that only take a string as an argument (e.g. XmlNode.SelectNodes) so we'd have difficulty adding support for another version of XPath without contorting the API.

Not to be dissuaded I pointed out that XPath 2.0 has a backwards compatibility mode which makes it compatible with XPath 1.0. Thus we wouldn't have to change our Select methods or introduce new methods for XPath 2.0 support since all queries that used to work in the past against our Select methods would still work if we upgraded our XPath implemention to version 2.0. This is where Arpan hit me with the one-two punch. He introduced me to a section of the XPath 2.0 spec called Incompatibilities when Compatibility Mode is true which reads

The list below contains all known areas, within the scope of this specification, where an XPath 2.0 processor running with compatibility mode set to true will produce different results from an XPath 1.0 processor evaluating the same expression, assuming that the expression was valid in XPath 1.0, and that the nodes in the source document have no type annotations other than xdt:untypedAny and xdt:untypedAtomic.

I was stunned by what I read and I am still stunned now. The W3C created XPath 2.0 which is currently backwards incompatible with XPath 1.0 and added a compatibility mode option to make it backwards compatible with XPath 1.0 but it actually still isn't backwards compatible even when in this mode? This seems completely illogical to me. What is the point of having a backwards compatibility mode if it isn't backwards compatible? Well, I guess now I know if we do decide to ship XPath 2.0 in the future we can't just add support for it transparently to our existing classes without causing some API churn. Unfortunate.

Caveat: The fact that a technology is mentioned as being on our 'nice to have' list or is suggested in a comment to this post is not an indication that it will be implemented in future versions of the .NET Framework.

Categories: XML

July 17, 2004

@ 02:40 AM

Comments [9]

Fighting Fire With Fire

Dave Winer writes

Russ Beattie says we should be careful not to give the Republicans ammo to kill Kerry. I am sorry Russ, I'm not worried about that. I'm more worried that the Dems are too flustered by the hardball tacticts of the Reps to fight back.

The only time I tend to watch regular TV that isn't TiVo is while working out in the morning at the health club. I've noticed that while John Kerry's ads tend to be about the qualities that make him a good candidate for president, George Bush's ads have mostly been negative ads attacking John Kerry. Personally I would love it if Kerry's campaign continues to take the high ground and shows the Republican party up for the rabid attack dogs that they are. The problem with this is that negative ads work and some people tend to look at not hitting back as a sign of weakness, which is what it seems Dave Winer is doing.

Whatever happened trying to change the tone in Washington and elevate the discourse? Just another case of "Do what I say, not what I do" I guess.

Categories: Ramblings

July 15, 2004

@ 07:33 AM

Comments [5]

Bang! Bang! My Baby Shot Me Down

I was reading an XML-Deviant column on XML.com entitled Browser Boom when I came across the following excerpt

The inevitable association with Microsoft's CLI implementation is proving a source of difficulty for the Mono project. The principal author of Mono's XML support, Atsushi Eno, posted to the Mono mailing list on the problems of being conformant in Mono's XML parser implementation. More specifically, whose rules should Mono conform to. W3C or Microsoft?

MS XmlTextReader is buggy since it accepts XML declaration as element content (that violates W3C XML specification section 3 Logical Structures). ... However, there is another discussion that it is useful that new XmlTextReader (xmlText, XmlNodeType.Element, null) accepts XML declaration.

... that error-prone XmlTextReader might be useful (especially for people who already depends on that behavior)

... we did not always reject Microsoft badness; for example we are copying System.Xml.XmlCDataSection that violates W3C DOM interface hierarchy (!)

The root of the dilemma is similar to that which Mozilla and Opera are trying to manage in the browser world.

What I find interesting is that instead of pinging the MSFT XML folks (like myself) and filing a bug report this spawned a dozen message email discussion on whether Mono should be bug compatible with the .NET Framework. Of course, if the Mono folks decide to be bug compatible with this and other bugs in System.Xml and we fix them thus causing breaking changes in some cases will we see complaints about how Microsoft is out to get them by being backwards incompatible? Now that Microsoft has created the MSDN Product Feedback Center they don't even have to track down the right newsgroup or email address of a Microsoft employee to file the bug.

It's amazing to me how much work people cause for themselves and conspiracy theories they'd rather live in than communicate with others.

Update: I talked to developer responsible for the XmlTextReader class and she responded "This is by design. We allow XML declaration in XML fragments because of the encoding attribute. Otherwise the encoding information would have to be transferred outside of the XML and manually set into XmlParserContext."

Categories: Life in the B0rg Cube | XML

July 15, 2004

@ 07:11 AM

Comments [0]

Microsoft Responds to Sun’s Web Service Benchmarks

Taken from the an article on TheServerSide.com entitled Microsoft Responds to Sun’s Web Service Benchmarks

In a paper published last month, Sun claimed that Java based web services outperform .NET based web services both in throughput and response times. Microsoft has released a paper on TheServerSide.NET responding to those claims stating that Sun’s representation of the .NET performance was understated by 2 to 3 times and that in many, but not all cases, .NET exceeded the Java benchmarks.
...
Read the Microsoft response on TheServerSide.NET: Web Services Performance: Comparing J2EE and .NET

Read Sun's original paper: J2EE claimed to have better Web Services performance than .NET

It should be noted that Sun did not publish source code for their benchmark so Microsoft had to re-create Sun's benchmark based on the details from the original paper. The Microsoft response has the full source code for both the .NET Web Service implementation, the Java Web Service implementation using Sun's JWSDP 1.4 along with the test program used to benchmark both services. I always believe the best way to verify a benchmark is to run it yourself. The performance of the .NET XML Web Service implementations should prove to be a lot better than what is implied by the original paper from Sun.

Categories: Mindless Link Propagation

July 14, 2004

@ 09:14 PM

Comments [0]

The Mini-Microsoft Blog

In the midst of a back and forth internal discussion on whether it is appropriate for folks to be griping about the recently announced MSFT benefit cuts on their work related blogs someone sent me a link to the Mini-Microsoft blog which describes itself thusly

Let's slim down Microsoft into a lean, mean, efficient customer pleasing profit making machine! Mini-Microsoft, Mini-Microsoft, lean-and-mean!

Subscribed!!!

Categories: Mindless Link Propagation

July 14, 2004

@ 03:42 PM

Comments [2]

C-Omega compiler preview available for download

A little while ago some members of our team experimented various ways to reduce the Relational<->Objects<->XML (ROX) impedance mismatch by adding concepts and operators from the relational and XML (specifically W3C XML Schema) world into an object oriented programming language. This effort was spear headed by a number of smart folks on our team including Erik Meijer, Matt Warren, Chris Lovett and a bunch of others all led by William Adams. The object oriented programming language which was used as a base for extension was C#. The new language was once called X# but eventually became known as Xen.

Erik Meijer presented Xen at XML 2003 and I blogged about his presentation after the conference. There have also been two papers published about the ideas behind Xen; Programming with Rectangles, Triangles, and Circles and Unifying Tables, Objects and Documents. It's a new year and the folks working on Xen have moved on to other endeavors related to future versions of Visual Studio and the .NET Framework.

However Xen is not lost. It is now part of the Microsoft Research project, Cw (pronounced C-Omega). Even better you can download a preview of the Cw compiler from the Microsoft Research downloads page

Categories: Technology | XML

July 14, 2004

@ 06:06 AM

Comments [3]

Feeds Without Dates and Being Too Clever for Your Own Good

Torsten has a blog post about an interesting bug in RSS Bandit. If you are subscribed to both Joe Gregorio and Ian Hixie's blogs then one of the entries in Ian Hixie's blog appears with the wrong date. The blog post that appears with the incorrect date is the post State of the WHAT from Ian Hixie's blog which is linked to from Joe Gregorio's post 3270 Redux. Instead of being dated 2004-06-29 as appears in Ian's RSS feed it is dated 2004-06-05 which is the same date as from Joe's post.

The problem arises from a workaround we came up with to deal with feeds that don't provide dates. Many users dislike feeds that don't have dates and prefer that we display some default date for such feeds. What we ended up doing was using the date the item was seen in the feed as the date for each item. In many cases this date isn't accurate. Sometimes the inaccuracy of this date is particularly glaring when a post from a feed with dates in it links to one with no dates in the feed because it may look like a feed is linking to a post in the future. For example, Joe Gregorio's post dated 2004-06-05 links to a post made by Ian Hixie on 2004-06-29. In this case, this is valid because Joe Gregorio went back and edited his blog post but didn't update the date in his feed. However RSS Bandit thinks this discrepancy in the dates is because we guessed the date for the entry in Ian's blog and thus corrects it by aligning it with the date from Joe's entry. The rationale for this behavior is that guessing that an undated entry was posted on the same day as someone linked to it is more accurate than guessing that it was posted when it was fetched. The bug is that when we use this heuristic we don't check to see if the entry whose date is being adjusted is actually an undated entry.

This has been fixed in the current codebase. The next question is whether we should actually be adjusting dates in this manner in any case.

Categories: RSS Bandit

July 11, 2004

@ 12:50 AM

Comments [3]

On Raining on the W3C's Parade

In a post entitled Dare Obasanjo is raining on the W3C's parade, Mike Dierken responds to my recent post which asks Is the W3C Becoming Irrelevant? by writing

Either way the primary mechanism the W3C uses to produce technology specs is to take a bunch of contradictory and conflictiong proposals then have a bunch of career bureaucrats try to find some compromise that is a union of all the submitted specs

Damn those career bureaucrats that built XML. Or is it the SOAP design process that caused the grief? And where did that technology come from anyway?

My original post already described the specs that have caused grief and show the W3C is losing its way. I assume that Mike is trying to use XML 1.0 and SOAP 1.1 as counter examples to the trend I pointed out. Well first of all, XML 1.0 was a proposal to design a subset of SGML so by definition it could not suffer the same problems that face attempts to innovate by committee which have hampered the W3C in current times. Also when XML 1.0 was created it was much smaller and a majority of the participants in the subsetting of SGML had similar goals. As for SOAP 1.1, it isn't a W3C spec. SOAP 1.1 was created by Don Box, Dave Winer and a bunch of Microsoft and IBM folks and then submitted to the W3C as a W3C Note.

Of course, the W3C has created iterations of both specs (XML 1.1 & SOAP 1.2) which in both cases are backwards incompatible with the previous versions. I leave it as an excercise to the reader to decide if having backwards incompatible point releases of Web specifications is how one 'leads the Web to its full potential'.

Categories: XML

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Wednesday, 21 July 2004 - Dare Obasanjo's weblog