March 11, 2004
@ 04:54 PM

Dave Winer has a proposal for merging RSS and ATOM. I'm stunned. It takes a bad idea (coming up with a redundant XML syndication format that is incompatible with existing ones) and merges it with a worse idea (have all these people who dislike Dave Winer have to work with him).

After adding Atom support to RSS Bandit a thought crystallized in my head which had been forming for a while; Atom really is just another flavor of RSS with different tag names. It looks like I'm not the only aggregator author to come to this conclusion, Luke Huttemann also came to the same conclusion when describing SharpReader implementation of Atom. What this means in practice is that once you've written some code that handles one flavor of RSS be it RSS 0.91, RSS 1.0, or RSS 2.0 then adding support for other flavors isn't that hard and they basically all have the same information just hidden in different tag names (pubDate vs. dc:date, admin:errorsReportsTo vs. webMaster, author vs. dc:creator, etc). To the average user of any popular aggregator there isn't any noticeable difference when subscribed to an RSS 1.0 feed vs. an RSS 2.0 feed or an RSS 2.0 feed vs. an  Atom feed.

And just like with RSS, aggregators will special case popular ATOM feeds with weird behavior that isn't described in any spec or interprets the specs in an unconventional manner. As Phil Ringnalda points out Blogger ATOM feeds claim that the summary contains XHTML when in fact they contain plain text. This doesn't sound like a big deal until you realize that in XHTML whitespace isn't significant (e.g. newlines are treated as spaces) which leads to poorly formatted content when the aggregator displays the content as XHTML when it truth it is plain text. Sam Ruby's ATOM feed contains relative links in the <url> and <link> elements but doesn't use xml:base. There is code in most aggregators to deal with weird but popular RSS feeds and it seems Atom is already gearing up to be the same way. Like I said, just another flavor of RSS. :)

As an aside I find it interesting that currently Sam Ruby's RSS 2.0 feed provides a much better user experience for readers than his ATOM feed. The following information is in Sam's RSS feed but not his Atom feed

  • Email address of the webmaster of the site. [who to send error reports to]
  • The number of comments per entry
  • An email address for sending a response to an entry
  • An web service endpoint for posting comments to an entry from an aggregator
  • An identifier for the tool that generated the feed
  • The trackback URL of each entry

What this means is that if you subscribe to Sam's RSS feed with an aggregator such as SharpReader or RSS Bandit you'll get a better user experience than if you used his Atom feed. Of course, Sam could easily put all the namespace extensions in his RSS feed in his Atom feed as well in which case the user experience subscribing to either feed would be indistinguishable.

Arguing about XML syndication formats is meaningless because the current crop that exist all pretty much do the same thing. On that note, I'd like to point out that websites that provide multiple syndication formats are quite silly. Besides confusing people trying to subscribe to the feed there isn't any reason to provide an XML syndication feed in more than one format. Particularly silly are the ones that provide both RSS and Atom feeds (like mine).

Blogger has it right here by providing only one feed format per blog (RSS or Atom). Where they screwed up is by forcing users to make the choice instead of making the choice for them. That's on par with asking whether they want the blog served up using HTTP 1.0 or HTTP 1.1. I'm sure there are some people that care but for the most part it is a pointless technical question to shove in the face of your users.


 

Categories: XML

I just read a post by Bruce Eckel entitled Generics Aren't where he writes

My experience with "parameterized types" comes from C++, which was based on ADA's generics... In those languages, when you use a type parameter, that parameter takes on a latent type: one that is implied by how it is used, but never explicitly specified.

In C++ you can do the equivalent:

class Dog {
public:
  void talk() { }
  void reproduce() { }
};

class Robot {
public:
  void talk() { }
  void oilChange() { }
};

template<class T> void speak(T speaker) {
  speaker.talk();
}

int main() {
  Dog d;
  Robot r;
  speak(d);
  speak(r);
}

Again, speak() doesn't care about the type of its argument. But it still makes sure – at compile time – that it can actually send those messages. But in Java (and apparently C#), you can't seem to say "any type." The following won't compile with JDK 1.5 (note you must invoke the compiler with the source -"1.5" flag to compile Java Generics):

public class Communicate  {
  public <T> void speak(T speaker) {
    speaker.talk();
  }
}

However, this will:

public class Communicate  {
  public <T> void speak(T speaker) {
    speaker.toString(); // Object methods work!
  }
}

Java Generics use "erasure," which drops everything back to Object if you try to say "any type." So when I say <T>, it doesn't really mean "anything" like C++/ADA/Python etc. does, it means "Object." Apparently the "correct Java Generic" way of doing this is to define an interface with the speak method in it, and specify that interface as a constraint. this compiles:

interface Speaks { void speak(); }

public class Communicate  {
  public <T extends Speaks> void speak(T speaker) {
    speaker.speak();
  }
}

What this says is that "T must be a subclass or implementation of Speaks." So my reaction is "If I have to specify a subclass, why not just use the normal extension mechanism and avoid the extra clutter and confusion?"

You want to call it "generics," fine, implement something that looks like C++ or Ada, that actually produces a latent typing mechanism like they do. But don't implement something whose sole purpose is to solve the casting problem in containers, and then insist that on calling it "Generics."

Although Bruce didn't confirm whether the above limitation exist in the C# implementation of generics he is right that they have the same limitations as Java generics. The article An Introduction to C# Generics  on MSDN describes the same limitations Bruce encountered with Java generics and shows how to work around them using constraints as Bruce discovered. If you read the problem statement described in the article on MSDN it seems the main goal of C# generics is to solve the casting problem in containers.

What I find interesting about Bruce's post is the implication that to properly implement generics one most provide duck typing. I've always thought the behavior of templates in C++ was weird in that one could pass in a parameter and not enforce constraints on the behavior of the type. Yet it isn't really dynamic or latent typing because there is compile time checking to see if the type supports those methods or operations.

A few years ago I wrote an article entitled C++ in 2005: Can It Become A Java Beater?  in which I gave some opinions on an interview with Bjarne Stroustrup where he discussed various language features he'd like to see in C++. One of those features was constraints on template arguments, below is an excerpt from my article on this topic

Constraints for template arguments

Bjarne: This can be simply, generally, and elegantly expressed in C++ as is.

Templates are a C++ language facility that enable generic programming via parameteric polymorphism. The principal idea behind generic programming is that many functions and procedures can be abstracted away from the particular data structures on which they operate and thus can operate on any type.

In practice, the fact that templates can work on any type of object can lead to unforeseen and hard to detect errors in a program. It turns out that although most people like the fact that template functions can work on many types without the data having to be related via inhertance (unlike Java), there is a clamor for a way to specialize these functions so that they only accept or deny a certain range of types.

The most common practice for constraining template arguments is to have a constraints() function that tries to assign an object of the template argument class to a specified base class's pointer. If the compilation fails then the template argument did not meet the requirements. 

The point I'm trying to get at is that both C++ users and its inventor felt that the being able to constrain the operations you could perform on the parameterized type as opposed to relying on duck typing was a desirable feature.

The next thing I want to point out is that Bruce does mention that generic programming in C++ was based on Ada's generics so I decided to spend some time reading up on them to see if they also supported duck typing. I read Chapter 12 of the book Ada 95: The Craft of Object-Oriented Programming where we learn

In the case of a linked list package, we want a linked list of any type. Linked lists of arrays, records, integers or any other type should be equally possible. The way to do this is to specify the item type in the package declaration as private, like this:

    generic
        type Item_Type is private;
    package JE.Lists is
        ...
    end JE.Lists;

The only operations that will be allowed in the package body are those appropriate to private types, namely assignment (:=) and testing for equality and inequality (= and /=). When the package is instantiated, any type that meets these requirements can be supplied as the actual parameter. This includes records, arrays, integers and so on; the only types excluded are limited types...

As you can see, the way you declare your generic type parameters puts restrictions on what operations you can perform on the type inside the package as well as what types you can supply as parameters. Specifying the parameter as ‘range <>’ allows the package to use all the standard operations on integer types but restricts you to supplying an integer type when you instantiate the package. Specifying the parameter as ‘private’ gives you greater freedom when you instantiate the package but reduces the range of operations that you can use inside the package itself.

So it looks like Ada gave you two options, neither of which look like what you can do in C++. You could either pass in any type then the only operations allowed on the type were equality and assignment or you could pass in a constrained type. Thus it doesn't look like Ada generics had the weird mix of static and duck typing that C++ templates have.

I am as disappointed as Bruce that neither C# nor Java support dynamic typing like languages such as Python or Smalltalk but I don't think parametric polymorphism via generics has ever been used to solve this problem. As I have pointed out neither Ada nor C++ actually give him the functionality he wants so I wouldn't be too hard on Java or C# if I was in his shoes.


 

Categories: Technology

Sam Ruby writes

 Ted Leung: If I'm looking for thought leadership from the community, in the Java community, I'm looking towards the non Sun bloggers -- these are the folks doing AOP, Groovy, SGen, Prevalence, WebWork, etc. This shows the rich ecosystem that has grown up around Java. If I look at the .NET community, I pretty much look for the MS bloggers.

Let's not confuse cause and effect here.  There used to be plenty of .Net bloggers who didn't work for Microsoft. 

It seems Sam and Ted have different ideas of what thought leadership is from me. When I think of thought leadership I think of ideas that add to the pool of common practices or impact the way developers work and think. Examples of thought leadership are the ideas in the GoF's Design Patterns or the writings of Joel Spolsky.

I read a lot of blogs from Microsoft and non-Microsoft people about .NET development and I see more thought leadership from non-Microsoft people than I do from Microsoft people. What I see from Microsoft people is what I'll term accidental thought leadership. Basically if I'm the developer or PM that designed or implemented component X then it stands to reason that I'm better placed to talk about it than others. Similarly if I'm one of the folks designing or implementing future technology Y then it stands to reason I'd be the best placed to talk about Longhorn/Indigo/Avalon/WinFS/Whidbey/Yukon/etc. Also the other thing is that it more interesting to read about upcoming future technology than it is to read about how best to use existing technology which is why people tend to flock to the blogs of the folks working on future stuff and ignore the Microsoft bloggers talking about existing technologies until they need a workaround for some bug.

Personally, the only real thought leadership I've seen from the 200 or so Microsoft blogs I've read have come from folks like Erik Meijer and Don Box. I see a lot of Microsoft people blogging about SOA but to me most of them are warmed over ideas that folks like Pat Helland have been talking about for years. When I think of thought leadership in the .NET world I'm more likely to think of Sam Gentile or Clemens Vastersr than I am to think of some blue badge carrying employee at the Redmond campus.  

What I do find interesting is that a Sun employee, Simon Phipps, is actually trying to use this to score points and claim that the lack of Sun bloggers with insightful posts is due to a "wide community as you'd expect from the openness of the JCP". When Microsoft folks weren't blogging and directly interacting with our developer community people railed because they felt the company was aloof and distant from its developers. Now we try to participate more and it is a sign that “it's a closed-source dictatorship - no amount of pushing up-hill will fix that”. I guess you can't win them all. :)  


 

Categories: Ramblings

I recently started using Real Player again after a few years of not using it and it does seem a lot less user hostile. It seems that this is the result of some internal turmoil at Real Networks. Below are links to some interesting readings about behind the scenes at Real and how it ended up affecting their product

This seems to have been making the rounds in a couple of popular blogs.


 

MSDN has a number of Developer Centers for key developer topics such as XML Web Services and C#. There are also node home pages for lesser interesting [according to MSDN] topics such as Windows Scripting Host or SQLXML. Besides the fact that developer centers are highlighted more prominently on MSDN as key topics the main differences between the developer centers and the node home pages are

  1. Developer Centers have a snazzier look and feel than node home pages.

  2. Developer Centers have an RSS feed.

  3. Developer Centers can pull in blog content (e.g. Duncan Mackenzie's blog on the C# Developer Center)

I've been working on getting a Developer Center on MSDN that provides a single place for developers to find out about XML technologies and products at Microsoft for about a year or more. The Developer Center is now about two weeks from being launched. There are only two questions left to answer.

The first question is what the tagline for the Developer Center should be. Examples of existing taglines are

  • Microsoft Visual C# Developer Center: An innovative language and tool for building .NET-connected solutions

  • Data Access and Storage Developer Center: Harnessing the power of data

  • Web Services Developer Center: Connecting systems and sharing information

  • .NET Architecture Developer Center: Blueprint for Success

I need something similar for the XML Developer Center but my mind's been drawing a blank. My two top choices are currently “The language of information interchange” or “Bridging gaps across platforms with the ubiqitous data format”. In my frivilous moments, I've also considered “Unicode + Angle Brackets = Interoperability”. Any comments on which of the three taglines I have in mind sounds best or suggestions for taglines would be much appreciated.

The second issue is how much we should talk about unreleased technologies. I personally dislike talking about technologies before they ship because history has taught me that projects slip or get cut when you least expect them to do so. For example, when I was first hired fulltime at Microsoft about two years ago we were working on XQuery which was supposed to be in version 2.0 of the .NET Framework. At the time the assumption was that they'd both (XQuery & the next version of the .NET Framework) be done by the end of 2003. It is now 2004 and it is optimistic to expect that either XQuery or the next version of the .NET Framework will both be done at the end of this year. If we had gone off our initial assumptions and started writing about XQuery and the classes we were designing for the .NET Framework (e.g. XQueryProcessor ) in 2002 and 2003 on MSDN then we'd currently have a number of outdated and incorrect articles on MSDN. On the other hand this does mean that while you won't find articles on XQuery on MSDN you do find articles like An Introduction to XQuery, XML for Data: An early look at XQuery ,X is for XQuery, and XQuery Tricks and Traps  on the developer websites of our competitors like IBM and Oracle. All four of those articles contain information that is either outdated or will be outdated when the W3C is done with the XQuery recommendation. However they do provide developers with a glimpse and an understanding of the fundamentals of XQuery.

The question I have is whether it would be valuable for our developers if we wrote articles about technologies that haven't shipped and whose content may differ from what we actually ship? Other developer centers on MSDN have decided to go this route such as the Longhorn Developer Center and Web Services Developer Center which regularly feature content that is a year or more away from shipping. I personally think this is unwise but I am interested in what the Microsoft developer community thinks of providing content about upcoming releases versus focusing on existing releases.


 

Categories: XML

...Pimps at Sea.

Thanks to Thaddeus Frogley for the link.


 

RSS Bandit provides users with the ability to perform searches from its toolbar and view the results in the same UI used for reading blogs if the results can be viewed as an RSS feed. This integration is provided for Feedster searches and I was considering adding other blog search engines to the default. Torsten had given me a RSS link to some search results on PubSub and they seemed to be better than Feedster in some cases. So this evening I decided to try out PubSub's weblog search and see if there was a way to provide similar integration with RSS Bandit. Unfortunately, it turns out that one has to provide them with an email address before you can perform searches.

I guess they aren't in a rush to have any users.


 

Categories: Technology

I always thought that dis records created by dueling MCs over conjured up beefs designed to sell records (e.g. the Nas vs. Jay-Z beef) I was surprised to find out that the same thing now happens with R&B songs. From an article on MTV.com entitled That Eamon Dis Track? Ho-Wopper Now Claims He Was Behind It we read

The beef between R&B singer Eamon and his so-called ex-girlfriend, Frankee, continues to heat up on radio, as stations across the country follow up his hit "F--- It (I Don't Want You Back)" with her dis track, "FU Right Back." Frankee's song, which uses the exact same music as "F--- It (I Don't Want You Back)," calls Eamon out as being lousy in bed, having pubic lice and generally sucking as a boyfriend. And Eamon loves every word. In fact, he claims he approved the song before the public even heard it.

Not only does he say Frankee was never his girl, he said she was handpicked by his staff to record a response to "F--- It  (I Don't Want You Back)" in order to create the illusion of a feud (see "Eamon's Alleged Old Flame Burns Him With Dis Track"). "There was a big tryout, and I actually know some of the girls who wanted to do the song, but I never met Frankee in my life," Eamon said. "I think it's corny to death, but it's funny."

I've listened to both songs, Eamon's is the better record although Frankee's version is kind of peppy.

Speaking of faked endings there's the article I read this afternoon on Yahoo! about the finalists on the reality show 'Last Comic Standing' were pre-picked which has caused some of the judges such as Drew Carey and Brett Butler to fire barbs at NBC. Brett Butler claimed "As panel judges, we can say that (a) we were both surprised and disappointed at the results and (b) we had NOTHING to do with them". It seems there was some fine print which indicated that the judges where just there for window dressing and the finalists were pre-picked. I guess this just goes to show you that you should always read the fine print.


 

March 7, 2004
@ 06:26 PM

There's currently a semi-interesting discussion about software patents on the XML-DEV mailing list sparked by a post by Dennis Sonoski entitled W3C Suckered by Microsoft where he rants angrily about why Microsoft is evil for not instantly paying $521 million dollars to Eolas and thereby starting a patent reform revolution. There are some interesting viewpoints voiced in the ensuing thread including Tim Bray's suggestion that Microsoft pay Tim Berners-Lee $5 million for arguing against the Eolas patent.

The thread made me think about what my position on filing software patents was given the vocal opposition to them on some online fora. I recently have gotten involved in patent discussions at work and I jotted down my thought processes as I was deciding whether filing for patents was a good idea or not. Belows are the pros and cons of filing for patents from my perspective in the trenches (so to speak).

PRO

  1. Having a patent or two on your resume is a nice ego and career boost.
  2. As a shareholder at Microsoft it is in my best interests to file patents which allow the company defend itself from patent suits and reap revenue from patent licencing.
  3. The modest financial incentive we get for filing patents would make for buying a few rounds of drinks with friends.

CON

  1. Filing patents involve having meetings with lawyers.
  2. Patents are very political because you don't want to snub anyone who worked on the idea but also don't want to cheapen them by claiming that people who were peripherally involved were co-inventers. For example, is a tester who points out a design flaw in an idea now one of the co-inventers if it was a fundamental flaw? 
  3. There's a very slight chance that Slashdot runs an article about a particular patent claiming that it is another evil plot by Microsoft. The fact that it is a slight chance is that the ratio of Slashdot articles about patents to those actually filed is quite small.

That was my thought process as I sat in on some patent meetings. Basically there is a lot of incentive to file patents for software innovations if you work for a company that can afford to do so. However the measure of degree of innovation is in the eye of the beholder [and up to prior art searches].

I've seen a number of calls for patent reform for software but not any that have any feasible or concrete proposals behind them. Most of the proponents of patent reform I've seen usually argue something akin to “Some patent that doesn't seem innovative to me got granted so the system needs to be changed“. How the system should be changed and whether the new system will not have problems of its own are left as excercises for the reader.

There have been a number of provocative writings about patent reform, the most prominent in my memory being the FSF's Patent Reform Is Not Enough and An Open Letter From Jeff Bezos On The Subject Of Patents. I suspect that the changes suggested by Jeff Bezos in his open letter do a good job of straddling the line between those who want do away with software and business method patents and those that want to protect their investment.

Disclaimer: The above statements are my personal opinions and do not represent my employer's view in anyway. 


 

As pointed out in a recent Slashdot article some researchers at HP Labs have come up with what they have termed a Blog Epidemic Analyzer which aims to “track how information propagates through networks. Specifically...how web based memes get passed on from one user to another in blog networks“. It sounds like an interesting idea, it would be cool to know who the first person to send out links about All Your Base Are Belong To Us or I Kiss You. I can also think of more serious uses of being able to track down the propagation of particular links across the World Wide Web.

Unfortunately, it seems the researchers behind this are either being myopic or have to justify the cost of their research to their corporate masters by trying to compare what they've done to Google. From the  Blog Epidemic Analyzer FAQ

2. What's the point?

There has been a lot of discussion over the fairness of blogs, powerlaws, and A-list bloggers (You can look at the discussion on Many2Many for some of the highlights). The reality is that some blogs get all the attention. This means that with ranking algorithms like Technorati's and Google's Page Rank highly linked blogs end up at the top of search pages. Sometimes (maybe frequently) this is what you want. However, it is also possible that you don't want the most connected blog. Rather you would like to find the blog that discovers new information first.

The above answer makes it sound like these guys have no idea what they are talking about. Google and Technorati do vastly different things. The fact that Google's search engine lists highly linked blogs at the top of search results that they are tangentially related to is a bug. For example, the fact that a random post by Russell Beattie about a company now makes him the fifth result that comes up for a search for that comapny in Google isn't a feature, it's a bug. The goal of Google (and all search engines) is to provide the most relevant results  for a particular search term. In the past, tying relevance to popularity was a good idea but with the advent of weblogs and the noise they've added to the World Wide Web this is becoming less and less of a good idea. Technorati on the other hand has one express purpose, measuring weblog popularity based on incoming links.

The HP iRank algorithm would be a nice companion piece to things like Technorati and BlogPulse but comparing it to Google seems like a stretch.


 

Categories: Technology