I just noticed that Arve Bersvendsen has written a post entitled 11 ways to valid RSS where he states he has seen 11 different ways of providing content in an RSS feed namely

Content in the description element

I have so far identified five different variants of content in the <description> element:

  1. Plaintext as CDATA with HTML entities - Validate
  2. HTML within CDATA - Validate
  3. HTML escaped with entities - Validate
  4. Plain text in CDATA - Validate
  5. Plaintext with inline HTML using escaping - Validate

<content:encoded>

I have encountered and identified two different ways of using <content:encoded>:

  1. Using entities - Validate
  2. Using CDATA - Validate

XHTML content

Finally, I have encountered and identified four different ways in which people has specified XHTML content:

  1. Using <xhtml:body> - Validate
  2. Using <xhtml:div> - Validate
  3. Using <body> with default namespace - Validate
  4. Using <div> with default namespace - Validate

At first these seem like a lot until you actually try to program against this using an XML parser. In which case, the first thing you notice is that there is no difference programming against CDATA vs. escaped entities since they are both syntactic sugar.  For example, the XML infoset and data models compatible with it such as the XPath data model do not differentiate character content that is written as character references, CDATA sections or entered directly. So the following

    <test><![CDATA[ ]]>2</test>
    <test>&#160;2</test>
    <test> 2</test>

are all equivalent. More directly if you loaded all three into an instance of System.Xml.XmlDocument and checked their InnerText property they'd all return the same result. So this reduces Arve's first two elements to

Content in the description element

I have so far identified five two different variants of content in the <description> element:

  1. HTML
  2. Plain text

<content:encoded>

I have encountered and identified two different ways one way of using <content:encoded>:

  1. Containing escaped HTML content

If your code makes any distinctions other than these then it is a sign that you have (a) misunderstood how to process RSS or (b) are using a crappy XML parser. When I first started working on RSS Bandit I also was confused by these distinctions but after a while things became clearer. The only problem here is the description element since you can't tell whether it is HTML or not without guessing. Since RSS Bandit always provides the content to an embedded web browser this isn't a problem but I can see how it could be one for aggregators that don't know how to process HTML (although I've never seen one before).

Another misunderstanding by Arve seems to be how namespaces work in XML. A few years ago I wrote an XML Namespaces and How They Affect XPath and XSLT where I wrote

A qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character...The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.

Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
        <complexType id="123" name="fooType"/>
</schema>

Bearing this information in mind this reduces Arve's example to

XHTML content

Finally, I have encountered and identified four two different ways in which people has specified XHTML content:

  1. Using <xhtml:body>
  2. Using <xhtml:div>

Thus with judicious use of an XML parser (which makes sense since RSS is an XML format), Arve's list of eleven ways of providing content in RSS is actually whittled down to five. I assume Arve is unfamiliar with XML processing which led to his initial confusion.

NOTE: Before anyone bothers to start pointing out that Atom somehow frees aggregator author from this myriad of options I'll point out that Atom has more ways of encoding content than these. Even ignoring the inconsequential differences in syntactic sugar in XML (escaped tags vs. unescaped tags in CDATA sections) the various combinations of the <summary> and <content> elements, the mode attribute (escaped vs. xml) and MIME types (text/plain, text/html, application/xhtml+xml) more than double the number of variations possible in RSS.


 

Categories: XML

March 16, 2004
@ 05:10 PM

While hanging around TheServerSide.com I discovered a series on Naked Objects. It's an interesting idea that eschews separating application layers in GUIs (via MVC) or server applications (presentation/business logic/data access layers) and instead only coding domain model objects which then have a standard GUI autogenerated for them. There are currently five articles in the series which are listed below with my initial impressions of each article provided below.

Part 1: The Case for Naked Objects: Getting Back to the Object-Oriented Ideal
Part 2: Challenging the Dominant Design of the 4-Layer Architecture
Part 3: Write an application in Java and deploy it on .Net
Part 4: Modeling simultaneously in UML, Java, and User Perspectives
Part 5: Fat is the new Thin: Building Rich Internet Applications with Naked Objects

Part 1 points out that in many N-tier server-side applications there are four layers; persistence, the domain model, the controller and presentation. The author points out that object-relational mapping frameworks are now popular as a mechanism for collapsing the domain model and persistence layer. Naked objects comes from the other angle and attempts to collapse the domain model, control and presentation layers. The article also argues that the current practices in application development efforts such as web services and component based architectures which separate data access from the domain model reduce many of the benefits of object oriented programming.

In a typical naked objects application, the framework uses reflection to determine the methods of an object and render them using a generic graphical user interface (screenshot). This encourages objects to be 'behaviourally complete' all significant actions that can be performed on the object must exist as methods on the object.

The author states that there are six benefits of using naked objects

  • Higher development productivity through not having to write a user interface
  • More maintainable systems through the enforced use of behaviourally-complete objects.
  • Improved usability deriving from a pure object-oriented user interface
  • Easier capture of business requirements because the naked objects would constitute a common language between developers and users.
  • Improved support for test-driven development
  • It facilitates cross-platform development

What I found interesting about the first article in the series is that the author rails against separating the domain model from data access layer but it seems naked objects are more about blending the GUI layer with the domain model. There seem to be some missing pieces to the article. Perhaps the implication is that one should use object-relational mapping technologies in combination with naked objects to collapse an application from 4 layers to a single 'behaviorally complete' domain model?

Part 2 focuses on implementing the functionality of a 4 layer application using naked objects. One of the authors had written a tutorial application for a book that which was software for running  an auto servicing shop which performed tasks like booking-in cars for service and billing the customer. The conclusion after the rewrite was that the naked objects implementation took less lines of code and had less classes than the previous implementation which had 4 layers. Also it took less time to add new functionality such as obtaining the customer's sales history to the application in the naked objects implementation than in the 4 layer implementation. 

There are caveats, one was that the user interface was not as rich as the one where the developer had an explicit presentation layer as opposed to relying on a generic autogenerated user interface. Also complex operations such as 'undoing' a business action were not supported in the naked objects implementation.  

Part 3 points out that if you write a naked objects implementation targetting Java 1.1 then you can compile it using J# without modification. Thus porting from Java to .NET should be a cinch as long as you use only Java 1.1. Nothing new here.

Part 4 points out that naked objects encourages “code first design” which the authors claim is a good thing. They also point out if one really wants to get UML diagrams out of a naked objects application they can use tools like Together which can generate UML from source code.

I'm not sure I agree that banging out code first and writing use cases or design documents afterwards is a software development methodology worth encouraging.

Part 5 trots out the old saw about rich internet applications and how much better they are than the limiting HTML-based browser applications. The author points out that with the writing a Java applet which uses the naked objects framework gives a richer user experience than an HTML-based application. However as mentioned in previous articles you could build an even richer client interface with an explicit presentation layer instead of relying on the generic user interface provided by the naked objects framework. 

Interesting ideas. I'm not sure how well they'd scale up to building real-world applications but it is always good to challenge assumptions so developers don't get complacent. 


 

Categories: Technology

March 15, 2004
@ 04:35 PM

It what seems to be the strangest news story I've read this year I find out Sun snatches up XML guru what I found particularly interesting in the story was the following excerpt

One of the areas Bray expects to work on is developing new applications for Web logs, or "blogs," and the RSS (Resource Description Framework Site Summary) technology that grew out of them. "I think that this is potentially a game-changer in some respects, and there are quite a few folks at Sun who share that opinion," he said.

Though RSS is traditionally thought of as a Web publishing tool, it could be used for much more than keeping track of the latest posts to blogs and Web sites, Bray said. "I would like to have an RSS feed to my bank account, my credit card, and my stock portfolio," he said.

Personally I think it's a waste of Tim Bray's talents having him work on RSS or it's competitor du jour, Atom, but it should be fun seeing whether he can get Sun out of it's XML funk as well stop them from spreading poisonous ideas like replacing XML with ASN.1.

Update: Tim Bray has a post about his new job entitled Sunny Boy where he writes

That aside, I’m comfy being officially a direct competitor of Microsoft. On the technical side, I find the APIs inelegant, the UI aesthetics juvenile, and the neglect of the browser maddening.

Sounds like fighting words. This should be fun. :)


 

Categories: XML

My homegirl, Gretchen Ledgard (y'know Josh's wife), has helped start the Technical Careers @ Microsoft weblog. According to her introductory post you'll find stuff like

  • Explanation of technical careers Microsoft.  What do people really do at Microsoft?  What does a “typical” career path look like?  What can you do to prepare yourself for a career at Microsoft?
  • Sharing of our recruiting expertise.  Learn “trade secrets” from Microsoft recruiters!  What does a good resume look like?  How can you get noticed on the internet?  How should you best prepare for an interview?
  • Information on upcoming Microsoft Technical Recruiting events and programs. 
  • I hope Gretchen dishes up the dirt on how the Microsoft recruiters deal with competition for a candidate such as when a prospective hire also has an offer from another attractive company such as Google. Back in my college days, the company that was most competitive with Microsoft was Trilogy (what a difference a few years make). 

    I remember when I first got my internship offer and I told my recruiter I also had an offer from i2 technologies, she quickly whipped out a pen and did the math comparing the compensation I'd get at Microsoft to that I'd get from i2. I eventually picked Microsoft instead of i2 for that summer internship which definitely turned out to be a life altering decision. Ahhh, memories.    


     

    After lots of procrastination we now have online documentation for RSS Bandit. As usual, The current table of contents is just a place holder and the real content is Phil Haack's Getting Started with RSS Bandit. The table of contents for the documentation I plan to write [once the MSDN XML Developer Center launches in about a week or so] is laid out below.

    • Bandit Help
      • Getting Started
        • What is an RSS feed?
        • What is an Atom feed?
        • The RSS Bandit user interface
        • Subscribing to a feed
        • Locating new feeds
        • Displaying feeds
        • Changing the web browser security settings
        • Configuring proxy server settings
      • Using RSS Bandit from Multiple Computers
        • Synchronization using FTP
        • Synchronization using a dasBlog weblog
        • Synchronization using a local or network file
      • Advanced Topics
        • Customizing the Way Feeds Look using XSLT
        • Creating Search Folders
        • Adding Integration with your Favorite Search Engine
        • Building and Using RSS Bandit plugins
      • Frequently Asked Questions
      • How to Give Provide Feedback
      • Contributing to RSS Bandit

    If you are an RSS Bandit user I'd love to get your feedback


     

    Categories: RSS Bandit

    I'm not sure which takes the cake for geekiest weeding proposal, popping  the question on a customized PC case or on a custom Magic: The GatheringTM card.

    Anyone else have some particularly geeky wedding proposals to share?


     

    It is now possible to use RSS Bandit to read protected Live Journal feeds now that they support HTTP authentication. Brad Fitzpatrick wrote

    Digest auth for RSS
    Aparently this was never announced:

    http://www.livejournal.com/users/USER/data/rss?auth=digest

    Get RSS feeds (including protected entries) by authenticating with HTTP Digest Auth. Good for aggregators.

    Good indeed. Given that I agitated for this in a previous post I'd like to thank the LiveJournal folks for implementing this feature.


     

    Categories: RSS Bandit

    I've written about this before but a recent mail from David Stutz and rumblings about slipped dates pushed this topic to the forefront of my mind today. If you have competition whose mantra is to ship "little components that can be combined together" and "release early, release often" is it wise to counter this with a strategy that involves integrating monolithic applications into even larger applications those multiplying the complexity and dealing with integration issues?

    On the one hand, no one can argue that the success of Microsoft Office isn't related to the fact that it is a suite of programs that work well together but on the other hand as David Stutz wrote in his farewell email

    As the quality of this software improves, there will be less and less reason to pay for core software-only assets that have become stylized categories over the years: Microsoft sells OFFICE (the suite) while people may only need a small part of Word or a bit of Access. Microsoft sells WINDOWS (the platform) but a small org might just need a website, or a fileserver. It no longer fits Microsoft's business model to have many individual offerings and to innovate with new application software. Unfortunately, this is exactly where free software excels and is making inroads. One-size-fits-all, one-app-is-all-you-need, one-api-and-damn-the-torpedoes has turned out to be an imperfect strategy for the long haul.

    Digging in against open source commoditization won't work - it would be like digging in against the Internet, which Microsoft tried for a while before getting wise. Any move towards cutting off alternatives by limiting interoperability or integration options would be fraught with danger, since it would enrage customers, accelerate the divergence of the open source platform, and have other undesirable results. Despite this, Microsoft is at risk of following this path, due to the corporate delusion that goes by many names: "better together," "unified platform," and "integrated software." There is false hope in Redmond that these outmoded approaches to software integration will attract and keep international markets, governments, academics, and most importantly, innovators, safely within the Microsoft sphere of influence. But they won't .

    Exciting new networked applications are being written. Time is not standing still. Microsoft must survive and prosper by learning from the open source software movement and by borrowing from and improving its techniques. Open source software is as large and powerful a wave as the Internet was, and is rapidly accreting into a legitimate alternative to Windows. It can and should be harnessed. To avoid dire consequences, Microsoft should favor an approach that tolerates and embraces the diversity of the open source approach, especially when network-based integration is involved.

    I don't agree with the general implication of David's comments but I do believe there is a grain of truth in what he writes. The issues aren't as black and white as he paints them but his opinions can't be written off either. The writing is definitely on the wall, I just wonder if anyone is reading it.


     

    Categories: Life in the B0rg Cube

    I hung out with Lili Cheng on Tuesday and we talked about various aspects of social software and weblogging technologies. She showed me Wallop while I showed her RSS Bandit and we were both suitably impressed by each other. She liked the fact that in RSS Bandit you can view weblogs as conversations while I liked the fact that unlike other social software I've seen Wallop embraced blogging and content syndication.

    We both felt there was more to social software than projects like Orkut and she had an interesting insight; Orkut and tools like it turned the dreary process of creating a contact list into something fun. And that's about it. I thought to myself that if a tool like Outlook allowed me to create contact lists in an Orkut-style manner instead of the tedious process that exists today that would really a killer feature. She also showed me a number of other research projects created by the Microsoft Research Social Computing Group. The one I liked best was MSR connections which can infer the relationships between people based on public information exposed via Active Directory. The visualizations produced were very interesting and quite accurate as well.  She also showed me a project called Visual Summaries which implemented an idea similar to Don Park's Friendship Circle using automatically inferred information.

    One of the things Lili pointed out to me about aggregators and blogging is that people like to share links and other information with friends. She asked about how RSS Bandit enables that I mentioned the ability to export feed lists to OPML and email people blog posts as well as the ability to invoke w.bloggar from RSS Bandit. This did get me to thinking that we should do more in this space. One piece of low hanging fruit is that users should be able to export a category in their feed list to OPML instead of being limited to the exporting the whole feedlist since they only want to share a subset of their feed. I have this problem because my boss has asked to try out my feed list but I've hesitated to send it to him since I know I subscribe to a bunch of stuff he won't find relevant. I also find some of the ideas in Joshua Allen's post about FOAF + De.licio.us  to be very interesting if you substitute browser integration with RSS Bandit integration. He wrote  

    To clarify, by better browser integration I meant mostly integration with the de.licio.us functionality.  For example, here are some of the things I would like to see:

    ·         Access to my shared bookmarks from directly in the favorites menu of IE; and adding to favorites from IE automatically adds to de.licio.us (I’m aware of the bookmarklet)

    ·         When browsing a page, any hyperlinks in the page which are also entries in your friends’ favorites lists would be emphasized with stronger or larger typeface

    ·         When browsing a page, some subtle UI cue would be given for pages that are in your friends’ favorites lists

    ·         When visiting a page that is in your favorites list, you could check a toolbar button to make the page visible to friends only, everyone, or nobody

    ·         When visiting a page which appears in other people’s favorites, provide a way to get at recommendations “your friends who liked this page also liked these other pages”, or “your friends recommend these pages instead of the one you are visiting”

    I honestly think the idea becomes much more compelling when you share all page visits, and not just favorites.  You can cluster, find people who have similar tastes or even people who have opposite tastes.  And the “trace paths” could be much more useful

    Ahhh, so many ideas yet so little free time. :)


     

    Categories: RSS Bandit | Technology

    March 11, 2004
    @ 04:54 PM

    Dave Winer has a proposal for merging RSS and ATOM. I'm stunned. It takes a bad idea (coming up with a redundant XML syndication format that is incompatible with existing ones) and merges it with a worse idea (have all these people who dislike Dave Winer have to work with him).

    After adding Atom support to RSS Bandit a thought crystallized in my head which had been forming for a while; Atom really is just another flavor of RSS with different tag names. It looks like I'm not the only aggregator author to come to this conclusion, Luke Huttemann also came to the same conclusion when describing SharpReader implementation of Atom. What this means in practice is that once you've written some code that handles one flavor of RSS be it RSS 0.91, RSS 1.0, or RSS 2.0 then adding support for other flavors isn't that hard and they basically all have the same information just hidden in different tag names (pubDate vs. dc:date, admin:errorsReportsTo vs. webMaster, author vs. dc:creator, etc). To the average user of any popular aggregator there isn't any noticeable difference when subscribed to an RSS 1.0 feed vs. an RSS 2.0 feed or an RSS 2.0 feed vs. an  Atom feed.

    And just like with RSS, aggregators will special case popular ATOM feeds with weird behavior that isn't described in any spec or interprets the specs in an unconventional manner. As Phil Ringnalda points out Blogger ATOM feeds claim that the summary contains XHTML when in fact they contain plain text. This doesn't sound like a big deal until you realize that in XHTML whitespace isn't significant (e.g. newlines are treated as spaces) which leads to poorly formatted content when the aggregator displays the content as XHTML when it truth it is plain text. Sam Ruby's ATOM feed contains relative links in the <url> and <link> elements but doesn't use xml:base. There is code in most aggregators to deal with weird but popular RSS feeds and it seems Atom is already gearing up to be the same way. Like I said, just another flavor of RSS. :)

    As an aside I find it interesting that currently Sam Ruby's RSS 2.0 feed provides a much better user experience for readers than his ATOM feed. The following information is in Sam's RSS feed but not his Atom feed

    • Email address of the webmaster of the site. [who to send error reports to]
    • The number of comments per entry
    • An email address for sending a response to an entry
    • An web service endpoint for posting comments to an entry from an aggregator
    • An identifier for the tool that generated the feed
    • The trackback URL of each entry

    What this means is that if you subscribe to Sam's RSS feed with an aggregator such as SharpReader or RSS Bandit you'll get a better user experience than if you used his Atom feed. Of course, Sam could easily put all the namespace extensions in his RSS feed in his Atom feed as well in which case the user experience subscribing to either feed would be indistinguishable.

    Arguing about XML syndication formats is meaningless because the current crop that exist all pretty much do the same thing. On that note, I'd like to point out that websites that provide multiple syndication formats are quite silly. Besides confusing people trying to subscribe to the feed there isn't any reason to provide an XML syndication feed in more than one format. Particularly silly are the ones that provide both RSS and Atom feeds (like mine).

    Blogger has it right here by providing only one feed format per blog (RSS or Atom). Where they screwed up is by forcing users to make the choice instead of making the choice for them. That's on par with asking whether they want the blog served up using HTTP 1.0 or HTTP 1.1. I'm sure there are some people that care but for the most part it is a pointless technical question to shove in the face of your users.


     

    Categories: XML