July 25, 2004
@ 12:30 AM

Recently there have been some complaints about duplicate entries showing up in RSS Bandit. This is due to a change I made in the most recent version of RSS Bandit. In RSS 2.0 there is an optional guid element that can be used to uniquely identify an item in an RSS feed. Unfortunately this element is optional so most aggregators end up using the link element instead in feeds that don't use guids. 

For the most part this worked fine. However I stumbled across a feed that used the same link for each item from a given day; the Cafe con Leche RSS feed. This meant that RSS Bandit couldn't differentiate between items posted on the same day. This was particularly important when tracking what items a user has read or whether an item has already been downloaded or not. I should have pinged the owner of the feed to point this problem out but instead I decided to code around this issue by using the combination of the link and title elements for uniquely identifying items. This actually turned out to be worse.

Although this fixed the problems with the Cafe con Leche RSS feed it caused other issues. This means that any time an item in a feed changed its title but kept the permalink the same (for example, if a typo was fixed in the title) then RSS Bandit thinks it's a different post and a duplicate entry shows up in the list view. Since popular sites like Boing Boing and Slashdot tend to do this almost every other day it means I turned a problem with a niche site that affects a few users to one that affects a number of popular websites thus affecting lots of users.

This problem will be fixed in the next version of RSS Bandit.


 

Categories: RSS Bandit

July 24, 2004
@ 08:51 PM

In the past couple of months Google has hired four people who used to work on Internet Explorer in various capacities [especially its XML support] who then moved to BEA; David Bau, Rod Chavez, Gary Burd and most recently Adam Bosworth. A number of my coworkers used to work with these guys since our team, the Microsoft XML team, was once part of the Internet Explorer team. It's been interesting chatting in the hallways with folks contemplating what Google would want to build that requires folks with a background in building XML data access technologies both on the client side, Internet Explorer and on the server, BEA's WebLogic.

Another interesting recent Google hire is Joshua Bloch. He is probably the most visible guy working on the Java language at Sun behind James Gosling. Based on recent interviews with Joshua Bloch about Java his most recent endeavors involved adding new features to the language that mimic those in C#.

While chomping on some cheap sushi at Sushi Land yesterday some friends and I wondered what Google could be planning next. So far, the software industry including my employer has been playing catchup with Google and reacting to their moves. According to news reports MSN is trying to catch up to Google search and Hotmail ups their free storage limit to compete with GMail. However this is all reactive and we still haven't seen significant competition to Google News, Google Image Search, Google Groups or even to a lesser extent Orkut and Blogger. By the time the major online networks like AOL, MSN or Yahoo! can provide decent alternatives to this media empire Google will have produced their next major addition.

So far Google doesn't seem to have stitched all its pieces into a coherent media empire as competitors like Yahoo! have done but this seems like it will only be a matter of time. What is of more interest to the geek in me is what Google could build next that could tie it all together. As Rich Skrenta wrote in his post the Secret Source of Google's Power

Google is a company that has built a single very large, custom computer. It's running their own cluster operating system. They make their big computer even bigger and faster each month, while lowering the cost of CPU cycles. It's looking more like a general purpose platform than a cluster optimized for a single application.

While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.

A friend of mine, Justin, had an interesting idea at dinner yesterday. What if Google ends up building the network computer? They can give users the storage space and reliability to run place all their data online. They can mimic the major desktop applications users interact with daily by using Web technologies. This sounds far fetched but then again, I'd have never imagined I'd see a free email service that gave 1GB of free email.

Although I think Justin's idea is outlandish but suspect the truth isn't much further from that.

Update: It seems Google also picked up another Java language guy from Sun. Neal Gafter who worked on various Java compiler tools including javac, javadoc and javap. Curiouser and curiouser.


 

Categories: Technology

During the most recent Download.Ject Internet Explorer incident [which was significant enough I saw newspaper headlines and TV news reports advicing people to switch browsers] I got some requests from RSS Bandit users to switch the browser used by RSS Bandit since they'd switched from using Internet Explorer due to security concerns.

Torsten and I looked around to see how feasible this would be and found the Mozilla ActiveX control which enables one to embed the Mozilla browser engine (Gecko) into any ActiveX application. The control implements the same APIs as the Internet Explorer control so it may be straightforward to make this change. 

I have some concerns about doing this.

  1. We've had weird interactions with COM interop between RSS Bandit and IE which result in weird bugs like dozens of IE windows being spawned and most recently memory corruption errors. I am wary of moving to an unknown quantity like Gecko and facing similar issues without the benefit of having a background of working with the component.

  2. There's a question of whether we replace our dependency on IE or ship an option to use Gecko instead of IE. Or whether we just ship a Gecko version and an IE version. The installer for the Mozilla ActiveX control is currently larger than the RSS Bandit download so we'd more than double the size of our download if we tied ourselves to Gecko.

I'm curious as to what RSS Bandit users think. Currently I don't think I'm going to add making such a switch to our plans but I am always interested in feedback from our users on what they think the right thing to do should be.


 

Categories: RSS Bandit

July 21, 2004
@ 07:15 AM

It seems every 3 months some prominent online publication complains about the amount of traffic RSS news readers cause to websites that provide RSS feeds. This time it is Slashdot with their post When RSS Traffic Looks Like a DDoS which references a post by Chad Dickerson, the CTO of Infoworld, entitled RSS growing pains. Chad writes

Several months ago, I spoke to a Web architect at a large media site and asked why his site didn’t support RSS. He raised the concern that thousands (or even millions) of dumb clients could wreak havoc on a popular Web site. Back when I was at CNN.com, I recall that our servers got needlessly pounded by a dumb client (IE4) requesting RSS-like CDF files at frequent intervals regardless of whether they had changed. As the popularity of RSS feeds at InfoWorld started to surge, I began to notice that most of the RSS clients out there requested and downloaded our feeds regardless of whether the feeds themselves had changed. At the time, we hadn’t quite reached the RSS tipping point, so I filed these thoughts away for later -- but “later” came sooner than I thought.

At this point I'd like to note that HTTP provides two mechanisms for web servers to tell clients if a network resource has changed or not. The basics of this mechanism is explained in the blog post HTTP Conditional Get for RSS Hackers which provides a way to prevent clients such as news readers from repeatedly downloading a Web document if it hasn't been updated. At this point I'd like to point out that at the current time, the InfoWorld RSS feed supports neither.

Another technique for reducing bandwidth consumption by HTTP clients is to use HTTP compression which greatly reduces the amount of data that has to be sent to a client when the feed has to be downloaded. For example, the current InfoWorld feed is 7427 bytes which shrinks to 2551 bytes when zipped using GZip on my home machine. This is a reduction by a factor of 3, on larger files the ratio of the reduced size to the original size is even better. Again, InfoWorld doesn't support this technique for reducing bandwidth consumption.

It is unsurprising that they are seeing significant bandwidth consumption from news aggregators. An RSS reader polling the InfoWorld site once an hour over an 8 hour period would download about 60 kilobytes of XML, on the other hand if they supported HTTP conditional GET requests and HTTP compression via GZip encoding this number would be under 3 kilobytes.  

The one thing that HTTP doesn't provide is a way for clients to deal with numerous connections being made to the site at once. However this problem isn't much different from the traditional scaling problem that web sites have to deal with today when they get a lot of traffic from regular readers.  


 

Today Arpan (the PM for XML query technologies in the .NET Framework) and I were talking about features we'd like to see on our 'nice to have' list for the Orcas release of the .NET Framework. One of the things we thought would be really nice to see in the System.Xml namespace was XPath 2.0. Then Derek being the universal pessimist pointed out that we already have APIs that support XPath 1.0 that only take a string as an argument (e.g. XmlNode.SelectNodes) so we'd have difficulty adding support for another version of XPath without contorting the API.

Not to be dissuaded I pointed out that XPath 2.0 has a backwards compatibility mode which makes it compatible with XPath 1.0. Thus we wouldn't have to change our Select methods or introduce new methods for XPath 2.0 support since all queries that used to work in the past against our Select methods would still work if we upgraded our XPath implemention to version 2.0. This is where Arpan hit me with the one-two punch. He introduced me to a section of the XPath 2.0 spec called Incompatibilities when Compatibility Mode is true which reads

The list below contains all known areas, within the scope of this specification, where an XPath 2.0 processor running with compatibility mode set to true will produce different results from an XPath 1.0 processor evaluating the same expression, assuming that the expression was valid in XPath 1.0, and that the nodes in the source document have no type annotations other than xdt:untypedAny and xdt:untypedAtomic.

I was stunned by what I read and I am still stunned now. The W3C created XPath 2.0 which is currently backwards incompatible with XPath 1.0 and added a compatibility mode option to make it backwards compatible with XPath 1.0 but it actually still isn't backwards compatible even when in this mode?  This seems completely illogical to me. What is the point of having a backwards compatibility mode if it isn't backwards compatible? Well, I guess now I know if we do decide to ship XPath 2.0 in the future we can't just add support for it transparently to our existing classes without causing some API churn. Unfortunate.

Caveat: The fact that a technology is mentioned as being on our 'nice to have' list or is suggested in a comment to this post is not an indication that it will be implemented in future versions of the .NET Framework.


 

Categories: XML

July 17, 2004
@ 02:40 AM

Dave Winer writes

Russ Beattie says we should be careful not to give the Republicans ammo to kill Kerry. I am sorry Russ, I'm not worried about that. I'm more worried that the Dems are too flustered by the hardball tacticts of the Reps to fight back.

The only time I tend to watch regular TV that isn't TiVo is while working out in the morning at the health club. I've noticed that while John Kerry's ads tend to be about the qualities that  make him a good candidate for president, George Bush's ads have mostly been negative ads attacking John Kerry. Personally I would love it if Kerry's campaign continues to take the high ground and shows the Republican party up for the rabid attack dogs that they are. The problem with this is that negative ads work and some people tend to look at not hitting back as a sign of weakness, which is what it seems Dave Winer is doing.

Whatever happened trying to change the tone in Washington and elevate the discourse? Just another case of "Do what I say, not what I do" I guess.

 


 

Categories: Ramblings

I was reading an XML-Deviant column on XML.com entitled Browser Boom when I came across the following excerpt

The inevitable association with Microsoft's CLI implementation is proving a source of difficulty for the Mono project. The principal author of Mono's XML support, Atsushi Eno, posted to the Mono mailing list on the problems of being conformant in Mono's XML parser implementation. More specifically, whose rules should Mono conform to. W3C or Microsoft?

MS XmlTextReader is buggy since it accepts XML declaration as element content (that violates W3C XML specification section 3 Logical Structures). ... However, there is another discussion that it is useful that new XmlTextReader (xmlText, XmlNodeType.Element, null) accepts XML declaration.

... that error-prone XmlTextReader might be useful (especially for people who already depends on that behavior)

... we did not always reject Microsoft badness; for example we are copying System.Xml.XmlCDataSection that violates W3C DOM interface hierarchy (!)

The root of the dilemma is similar to that which Mozilla and Opera are trying to manage in the browser world.

What I find interesting is that instead of pinging the MSFT XML folks (like myself) and filing a bug report this spawned a dozen message email discussion on whether Mono should be bug compatible with the .NET Framework. Of course, if the Mono folks decide to be bug compatible with this and other bugs in System.Xml and we fix them thus causing breaking changes in some cases will we see complaints about how Microsoft is out to get them by being backwards incompatible? Now that Microsoft has created the MSDN Product Feedback Center they don't even have to track down the right newsgroup or email address of a Microsoft employee to file the bug.

It's amazing to me how much work people cause for themselves and conspiracy theories they'd rather live in than communicate with others.

Update: I talked to developer responsible for the XmlTextReader class and she responded "This is by design. We allow XML declaration in XML fragments because of the encoding attribute. Otherwise the encoding information would have to be transferred outside of the XML and manually set into XmlParserContext."


 

Categories: Life in the B0rg Cube | XML

Taken from the an article on TheServerSide.com entitled Microsoft Responds to Sun’s Web Service Benchmarks

In a paper published last month, Sun claimed that Java based web services outperform .NET based web services both in throughput and response times. Microsoft has released a paper on TheServerSide.NET responding to those claims stating that Sun’s representation of the .NET performance was understated by 2 to 3 times and that in many, but not all cases, .NET exceeded the Java benchmarks.
...
Read the Microsoft response on TheServerSide.NET: Web Services Performance: Comparing J2EE and .NET

Read Sun's original paper: J2EE claimed to have better Web Services performance than .NET

It should be noted that Sun did not publish source code for their benchmark so Microsoft had to  re-create Sun's benchmark based on the details from the original paper. The Microsoft response has the full source code for both the .NET Web Service implementation, the Java Web Service implementation using Sun's JWSDP 1.4 along with the test program used to benchmark both services. I always believe the best way to verify a benchmark is to run it yourself. The performance of the .NET XML Web Service implementations should prove to be a lot better than what is implied by the original paper from Sun.


 

July 14, 2004
@ 09:14 PM

In the midst of a back and forth internal discussion on whether it is appropriate for folks to be griping about the recently announced MSFT benefit cuts on their work related blogs someone sent me a link to the Mini-Microsoft blog which describes itself thusly

Let's slim down Microsoft into a lean, mean, efficient customer pleasing profit making machine! Mini-Microsoft, Mini-Microsoft, lean-and-mean!

Subscribed!!!


 

A little while ago some members of our team experimented various ways to reduce the Relational<->Objects<->XML (ROX) impedance mismatch by adding concepts and operators from the relational and XML (specifically W3C XML Schema) world into an object oriented programming language. This effort was spear headed by a number of smart folks on our team including Erik Meijer, Matt Warren, Chris Lovett  and a bunch of others all led by William Adams. The object oriented programming language which was used as a base for extension was C#. The new language was once called X# but eventually became known as Xen.

Erik Meijer presented Xen at XML 2003 and I blogged about his presentation after the conference. There have also been two papers published about the ideas behind Xen; Programming with Rectangles, Triangles, and Circles and Unifying Tables, Objects and Documents. It's a new year and the folks working on Xen have moved on to other endeavors related to future versions of Visual Studio and the .NET Framework.

However Xen is not lost. It is now part of the Microsoft Research project, Cw (pronounced C-Omega). Even better you can download a preview of the Cw  compiler from the Microsoft Research downloads page


 

Categories: Technology | XML