Wednesday, 09 June 2004 - Dare Obasanjo's weblog

June 9, 2004

@ 05:22 PM

Making RSS Bandit Competitive with Commercial Aggregators

I recently wrote that I want to make RSS Bandit compete more with commercial aggregators which elicited a comment about what exactly this means. Primarily it means that it is my intention that we should support what I consider are the three primary differentiating features of the commercial desktop aggregators I've seen (NetNewsWire, FeedDemon and NewzCrawler). The features are

Newspaper Views: FeedDemon has the ability to display news items in a newspaper view which is a feature that Torsten batted around a few months ago but decided not to do because we didn't think it was that useful. However now that I read a number of feeds that tend to publish 30 - 50 items a day, being able to view the entries in a single page actually would be useful. My goal is for this feature to be 100% compatible with FeedDemon newsjpaper views meaning that you can use existing FeedDemon newspapers such as Radek's newspaper views for FeedDemon with RSS Bandit.
WYSIWYG Weblog Editor: This feature was on my old RSS Bandit wishlist but I never got around to implementing it because of my displeasure with the MetaWeblog API. I've been waiting for the Atom project to produce a SOAP based API with built in authentication that would be widely supported by blogging tools before implementing this feature but it is now clear that such a specification won't be finalized anytime soon. Since I don't do much GUI work I'll definitely need help from either Torsten or Phil with getting this done.
NNTP Support: The promise of providing a uniform interface to various discussion forums whether they are Web based discussions exposed via RSS or in USENET is too attractive to pass up.

Of course, we will also fix the various bugs and respond to the various feature requests we've gotten from a number of our users. Torsten is currently on vacation and I'll most likely be gone for a week later on this month so development probably won't start in earnest until next month. Until then keep your feedback coming and thanks a lot for using RSS Bandit.

Categories: RSS Bandit

June 8, 2004

@ 09:46 AM

Comments [2]

Applied XML DevCon: Call for Speakers

Chris Sells has announced the call for speakers for the Applied XML Developers Conference 5. From his post

Are you interested in presenting a 45-minute talk on some applied XML or Web Services topic? It doesn't matter which platform or OS you're targeting. It also doesn't matter whether you're an author or vendor or professional speaker or a developer in the trenches (in fact, I tend to be biased towards the latter). We're after interesting and unique applications of XML and Web Services technology and if you're doing good work in that area, then I need you to send me a session topic and 2-4 sentence abstract along with a little bit about yourself. I'll be taking submissions 'til the end of June, but don't delay...

...the conference itself is likely to be in Oregon during the 2nd or 3rd week of September, 2004, but we're still working the details out. One of the fun things that we're thinking about this year is to have the Dev.Conf. in Sunriver, Oregon, a resort and spa town in central Oregon where sun is plentiful and rain is scarce.

Previous XML DevCons have had a wide variety of interesting speakers. Unfortunately, the XML DevCon webpage doesn't provide any information on previous conferences. If you are interested in reports on last year's conference just type "XML DevCon" in your favorite Web search engine to locate blog postings from some of the attendees.

I probably won't be at this conference since the focus is usually XML Web Services while my professional interests are in core XML technologies with working with XML syndication formats being a hobby. However there should be lots of interesting presentations on XML Web Services and other leading edge applications of XML from industry experts if last year's conference is anything to go by.

Categories: XML

June 8, 2004

@ 09:22 AM

Comments [1]

Jon Udell and WinFS

Jon Udell has started a series of blog posts about the pillars of Longhorn. So far he has written Questions about Longhorn, part 1: WinFS and Questions about Longhorn, part 2: WinFS and semantics which ask the key question "If the software industry and significant parts of Microsoft such as Office and Indigo have decided on XML as the data interchange format, why is the next generation file system for Windows basically an object oriented database instead of an XML-centric database?"

I'd be very interested in what the WinFS folks like Mike Deem would say in response to Jon if they read his blog. Personally, I worry less about how well WinFS supports XML and more about whether it will be fast, secure and failure resistant. After all, at worst WinFS will support XML as well as a regular file system does today which is good enough for me to locate and query documents with my favorite XML query language today. On the other hand, if WinFS doesn't perform well or shows the same good-idea-but-poorly-implemented nature of the Windows registry then it'll be a non-starter or much worse a widely used but often cursed aspect of Windows development (just like the Windows registry).

As Jon Udell points out the core scenarios touted for the encouraging the creation of WinFS (i.e search and adding metadata to files) don't really need a solution as complex or as intrusive to the operating system as WinFS. The only justification for something as radical and complex as WinFS is if Windows application developers end up utilizing it to meet their needs. However as an application developer on the Windows platform I primarily worry about three major aspects of WinFS. The first is performance, I definitely think having a query language over an optimized store in the file system is all good but wouldn't use it if the performance wasn't up to snuff. Secondly I worry about security, Longhorn evangelists like talking up what a wonderful world it would be if all my apps could share their data but ignore the fact that in reality this can lead to disasters. Having multiple applications share the same data store where one badly written application can corrupt the entire store is worrisome. This is the fundamental problem with the Windows registry and to a lesser extent the cause of DLL hell in Windows. The third thing I worry about is that the programming model will suck. An easy to use programming model often trumps almost any problem. Developers prefer building distributed applications using XML Web Services in .NET to the alternatives even though in some cases this choice leads to lower performance. The same developers would rather store information in the registry than come up with a robust alternative on their own because the programming model for the registry is fairly straightforward.

All things said, I think WinFS is an interesting idea. I'm still not sure it is a good idea but it is definitely interesting. Then again given that WinFS assimilated and thus delayed a very good idea from shipping, I may just be a biased SOB.

PS: I just saw that Jeremy Mazner posted a followup to Jon Udell's post entitled Jon Udell questions the value and direction of WinFS where he wrote

XML formats with well-defined, licensed schemas, are certainly a great step towards a world of open data interchange. But XML files alone don't make it easier for users to find, relate and act on their information. Jon's contention is that full text search over XML files is good enough, but is it really? I did a series of blog entries on WinFS scenarios back in February, and I don't think's Jon full text search approach would really enable these things.

Jeremy mostly misses Jon's point which is aptly reduced to a single question at the beginning of this post. Jon isn't comparing full text search over random XML files on your file system to WinFS. He is asking why couldn't WinFS be based on XML instead of being an object oriented database.

Categories: Technology | XML

June 6, 2004

@ 04:41 AM

Comments [6]

Opportunity Cost and War

Tim Bray has a post entitled Whiskey-Bar Economics where he writes

As an added bonus, in the comments someone has posted a pointer to this, which (if even moderately accurate) is pretty astounding.

I'm not sure what is pretty astounding about CostOfWar.com. The Javascript on the site seems pretty basic, the core concept behind the site is opportunity cost which is explained in freshman economics class of the average college or university and the numbers from the site actually seem to be lowballed considering all the headlines I seem to read every month about the Bush administration requesting another couple of billion for the Iraq effort. For example, according to a USA Today article entitled Bush to request $25 billion for Iraq war costs, the US congress had already approved $163 billion for the War on Iraq when the another request for $25 billion showed up. Yet at the current time CostOfWar.com, claims that the war has cost $116 billion.

On the other hand, I think this is pretty astounding.

Categories: Ramblings

June 6, 2004

@ 04:18 AM

Comments [2]

RDF as a More Extensible XML

One of my friends, Joshua Allen, is a fan of RDF and Semantic Web technologies. Given that I respect his opinion a lot I keep trying to delve into RDF and its family of technologies every couple of months to see what it provides to the world of data access and information interchange above and beyond existing technologies. Recently I discovered that there are some in the RDF camp that position it as a "better XML". The first example of this I saw was an old article by Tim Berners-Lee entitled Why RDF model is different from the XML model. According to Tim the note is an attempt to answer the question, "Why should I use RDF - why not just XML?". However instead of answering the question his note just left me with more questions than answers. The pivotal point for me in Tim Berners-Lee's note is the following excerpt

Things you can do with RDF which you can't do with XML include

You can parse the semantic tree, which end up giving you a set of (possibly mutually referential) triples and then you can use the ones you want ignoring the ones you don't understand.

Problems with basing you understanding on the structure include

Without having gone to the trouble of getting the schema, or having an application hand-programmed to recognise a particular document type, you can't pick up any semantic information from a document;
When an XML schema changes, it could typically introduce new intermediate elements (like "details" in the tree above or "div" is HTML). These may or may or may not invalidate any query which has been based on the structure of the document.
If you haven't gone to the trouble of making a semantic model, then you may not have a well defined one.

It seems that the point being argued is that with RDF you can get more understanding of the information in the document than with just XML. Being that one could consider RDF as just a logical model layered on top of an XML document (e.g. RDF/XML) I find it hard to understand how viewing some XML document through RDF colored glasses buys one so much more understanding of the data.

Recently I discovered a presentation entitled REST, Self-description, and XML by Mark Baker. This presentation discusses the ideas in Tim Berners-Lee's note in more depth and in a way I finally understand. The first key idea in Mark's presentation is the notion of "self describing" data formats which were also covered in Tim Berners-Lee's presentation at WWW2002 entitled Specs Count. The core tennets of "self describing" data formats are covered in slide 10 and slide 11 of Mark's presentation. A "self describing" data formats contains all the data needed to figure out how to process the format from publically accessible specs. For example, an HTTP response tells you the MIME type of the document which can be used to locate the appropriate RFC which governs how the format should be processed. In the case of XML, Tim Berners-Lee states that an HTTP response which returns an XML document either as application\xml or text\xml should be processed according to the rules of the XML and XML namespaces recommendations which state that the identity of an element is determined based on its namespace name. So when processing an XML document, Tim asserts that it is self describing because one can locate the spec for the format from the namespace URI of the root element. Of course, Mark disagrees with this but his reasons for doing so is pedantic spec lawyering. I disagree with it as well but for different reasons. The main reason I disagree with it is because it puts a stake in the ground and says that any XML format on the Web that doesn't use namespace name for its root element or whose namespace name is not a dereferenceable URI that leads to a spec is broken. This automatically states that XML formats used on the Web today such as RSS 1.0, RSS 2.0, OPML and the Atom 0.3 syndication format are broken.

Mark then goes on to state in slide 20 that a problem with XML formats is that one can't arbitrarily extend an XML document without it's schema or without breaking some application somewhere. It's unclear as to what he means by the document's schema but will grant that it is likely that arbitrary additions to the expected content of an XML document will break certain applications. Getting to slide 24, it is slightly clearer what Mark is getting at. He claims that one although one can add extend a format by adding extra elements from a known namespace using just XML technologies this doesn't tell you how to deal with the extensions. On the other hand, with RDF the extensions are all concepts named with a URI whose meaning can then be looked up using HTTP GET. This is where he lost me. I don't see the difference between seeing a namespaced XML element in an XML format and using HTTP GET on the namespace URI of the element to locate the spec or schema for the namespaced extension and what he describes as the gains of using RDF.

The more I look at how RDF people bag on XML the more it seems that they don't really write applications in today's world. Almost every situation I've seen someone claim that RDF technologies will in the future be able to solve a problem XML cannot, the problem is actually not only solveable with XML technologies but actually is being solved using XML technologies today.

Categories: XML

June 6, 2004

@ 02:26 AM

Comments [3]

Why I Can't Wait for Longhorn

One of the more annoying aspects of writing Windows applications using the .NET Framework is that eventually you brush up against the limitations in the APIs provided by the managed classes and end up having to use interop to talk to Win32 or COM-based APIs. This process typically involves exposing native code in a manner that makes them look like managed APIs when in fact they are not. When there is an error in this mapping it results in hard-to-track memory corruption errors. All of the fun of annoying C and C++ memory corruption errors in the all new, singing and dancing .NET Framework.

Most recently we were bitten by this in RSS Bandit and probably would never have tracked this problem down if not for a coincidence. As part of forward compatibility testing at Microsoft, a number of test teams run existing .NET applications on current builds of future versions of the .NET Framework. One of these test teams decided to use RSS Bandit as a test application. However it seemed they could never get RSS Bandit to start without the application crashing almost instantly. Interestingly, it crashed at different points in the code depending on whether one compiled and ran the application on the current build of the .NET Framework or just ran an executable compiled against an older version of the .NET Framework on the current build. Bugs were filed against folks on the CLR team and the problem was tracked down.

It turns out that our declaration of the STARTUPINFO struct obtained from PInvoke.NET was incorrect. Specifically the following fields which were declared as

[MarshalAs(UnmanagedType.LPWStr)] public string lpReserved; [MarshalAs(UnmanagedType.LPWStr)] public string lpDesktop; [MarshalAs(UnmanagedType.LPWStr)] public string lpTitle;

were declared incorrectly. We should have declared them as

public IntPtr lpReserved; public IntPtr lpDesktop; public IntPtr lpTitle;

The reason for not declaring them as strings is that the Interop marshaler, after having converted the string to a managed string, will release the native data using CoTaskMemFree. This is clearly not the right thing to do in this case so we need to declare the fields as IntPtrs and then manually marshal them to strings via the Marshal.PtrToStringUni() API.

The problems with errors that occur due to such memory corruption issues is that their results are unpredictable. Some users may never witness a crash, while others witness the crash when their machines are under memory pressure or in some cases it crashes right away. Of course, the crash is never in the same place twice. Not only do these problems waste lots of developer time trying to track them down they lalso lead to negative user experience with the target application.

Hopefully, when Longhorn ships and introduces WinFX this class of problem will become a thing of the past. In the meantime, I need to spend some time going over our code that does all the Win32 interop to ensure that there are no other such issues waiting to rear their head.

Categories: Technology

June 5, 2004

@ 06:34 AM

Comments [4]

Common Misconceptions About RSS

I find it interesting how often developers tend to reinvent because of looking at a problem from only one perspective. Today I read a blog post by Sean Gephardt called RSS and syndication Ideas? where he repeats two common misconception about RSS and syndication technologies. He wrote

What if I only want certain folks to has access to my RSS?

I could require the end user to signin to my site, then provide them access to my RSS feeds, but then they would be required to sign in everytime they tried to update thier view.

More specifically, how could a company track people that have subscribed to a particular RSS feed once they are viewing it in an aggregator? Obviously, if someone actually views the page referenced, then web site tracking applies, but some aggregators I've seen simply render the contents of the description, which if it contains a URL to somewhere, and the user clicks that link, the reader gets taken over to that URL, bypassing the orignal.

Since there is no security around RSS and aggregrators, and no way to prompt users for say, a Passport authentication, should RSS be used only for "public" information? Do you make people sign in once they try to access the “deeper” content? Do you keep the RSS content limited to help drive people to the “real“ content?

Am I missing something glaringly obvious?

Considering that fetching an RSS feed is simply fetching an XML document over the Web using HTTP and there are existing technologies for authenticating and encrypting HTTP requests, I'd have to say "Yes, you have missed something glaringly obvious Sean". In fact, not only can you authenticate and encrypt RSS feeds with the same authentication means used by the rest of the World Wide Web, aggregators like RSS Bandit already support this functionality. In fact, here is a list of aggregators that support private RSS feeds.

As for how to how to track readership of content in RSS feeds. A number of tools already support tracking such statistics using web bugs such as dasBlog and .TEXT. One could also utilize alternate approaches if the feeds are private feeds since one could assign a separate URL to each user.

All of this is stuff that already works on today's World Wide Web when interacting with HTML and HTTP. It is interesting that some people think that once you swap out HTML with XML, entire new approaches must be built from the ground up.

Categories: Syndication Technology

June 4, 2004

@ 04:34 PM

Comments [1]

Jumping on the Cluetrain and Legal Liabity

Josh Ledgard (who along with his wife Gretchen hosted an excellent barbeque this past memorial day weekend) has a post entitled Blogs, Alpha Builds, Customer Community, and Legal Issues where he discusses some of the questions around legal issues some of us have been asking about in the B0rg cube with regards to the growing push to be more open and interact more directly with customers. Josh writes

Blogging Disclaimers

Some Microsofties are now including disclaimer text at the end of each posting in addition to linking to a disclaimer on the sidebar. A long internal thread went around where “best practice” guidance was given from a member of the legal team that included inserting the disclaimer into every entry as well as in any comment we leave on other blogs.

The various discussions I've seen around blogging disclaimers often boil down to pointing out that they are unlikely to be useful in preventing real trouble (i.e. some customer who gets pissed at bad advice he gets from a Microsoft employee and decides to sue). Of course, I don't know if this has been tested in court so take my opinions with a grain of salt. I look at disclaimers as a way of separating personal opinion from Microsoft's official position. This leads me to another statement from Josh

From a purely non-legal perspective I would also have to call BS on the standard disclaimer text.

“These postings are provided "AS IS" with no warranties, and confers no rights. The content of this site contains my own personal opinions and does not represent my employer's view in anyway.”

I have no problem with the first sentence, but the second would bother me a bit. What represents a company better than the collective values and opinions of its employees that are expressed through their blogs.

I completely disagree with Josh here. I don't believe that my personal opinion and the Microsoft official position are the same thing even though some assume that we are b0rg. Also I want to be able to make it clear when what I am saying is my personal opinion and when what I am saying somewhat reflects an official Microsoft position. For example, I am the program manager responsible for XML schema technologies in the .NET Framework and statements I make around these technologies may be considered by some to be official statements independent of where these statements are made. If I write “RELAX NG is an excellent XML schema language and is superior to W3C XML Schema for validating XML documents in a variety of cases”, some could consider this an indication that Microsoft will start supporting RELAX NG in some of its products. However this would be an inaccurate assumption since that comment is personal opinion and not a reflection of Microsoft's official policy which is unified around using W3C XML Schema for describing the structure and validation of XML documents. I tend to use disclaimers to separate my personal opinions from my statements as a Microsoft employee although a lot of times the lines blur.

Categories: Life in the B0rg Cube

June 4, 2004

@ 07:47 AM

Comments [0]

XML Web Services != Distributed Objects 3: Sometimes U Have To Explain

As I mentioned yesterday Doug Purdy posted an insightful entry in response to Ted Neward's about the inappropriateness of returning ADO.NET DataSets from XML Web Services. Today Ted Neward has a post entitled Why Purchase Orders are the root of all evil? which almost entirely misses the point of Doug's post.

Ted writes

Could you tell me what the schema should be? Doug, it's right there in front of you: the class definition itself is a schema definition for objects of its type. The question I think you mean to ask is, "What the XML schema should be for this Purchase Order?", but I can't do that, because you've already stepped way out into la-la land as far as XML/XSD goes by making use of generic types (like Dictionary) for which there is no XSD equivalent; sure, we can rpc-encode one up, but we're back to turning objects into XML and back again, and I thought we didn't like that....?

Could you tell me what each particle of the schema means? Well, the LineItemAddedEvent certainly isn't a schema construct, so I'm guessing that'll have to be the XML-based representation of a .NET delegate.... the IAddress has no implementation behind it that I can see so once again I'll have to punt....

Oh, I get it... Doug's using one of them anti-pattern thingies to show us what not to do when trying to define types in XML/XSD for use in Web services (or WebServices or web services or however we've decided to spell these silly things anyway).

You're absolutely right, Doug--the way that thing is written, Purchase Orders, while perhaps not the root of ALL evil, are certainly evil and therefore should be banned from the WS-* camp immediately.

Seriously, dude, DataSets as return values from Web services are evil. Get over it.

What I find interesting is that Ted Neward is looking at XML Web Services through the perspective of distributed objects. His entire arguments hinge around the fact that his applications convert XML into Java or CLR objects so the XML returned must be something that is condusive to converting to objects easily. Doug accurately points out that there is no one-to-one mapping between an XML schema and a CLR object. Arguing that your favorite platform has one-to-one mappings for some XML schemas and not others thus banning various XML formats from participating in XML Web Services is a very limiting viewpoint. I'd like to ask Ted whether he also would ban XBRL, wordProcessingML or UBL documents from being used in XML Web Services because there aren't easy ways to convert them to a handy, dandy Java object with strongly typed members and all that jazz.

I don't dispute the practical reasons for discouraging developers from returning ADO.NET DataSets from XML Web Services since most developers trying to access the XML Web Services just use a toolkit that pretends you are building distributed object applications. Usually such toolkits either barf horribly when faced with XML they don't grok or force developers to have deal with scary angle brackets directly instead of the object facade they know & love (ASP.NET XML Web Services included). This is a practical reason to avoid exposing ADO.NET DataSets from XML Web Services that may be accessed from Java platforms especially since such platforms don't make it easy to deal with raw XML.

On the other hand, claiming that there is some philosophical reason not to expose data from an XML Web Service that may be be semi-structured and full of unknown data (i.e XML data) seems quite antithetical to the entire point of XML Web Services and the Service Oriented Architecture fad.

Categories: XML

June 3, 2004

@ 06:03 PM

Comments [6]

Would You Like to Work for the XML Team at Microsoft?

We have a few open developer and program manager positions on the WebData XML team at Microsoft. Are you are interested in working on implementing XML technologies that will impact not only significant aspects of Microsoft but the software industry at large? Would like to get a chance to collaborate with teams as diverse as Office, Windows (Avalon, Indigo & WinFS), BizTalk, SQL Server and Visual Studio on building the next generation of XML technologies for the Microsoft platform? Do you get passionate about XML or related technologies? If so take a gander at the following open job descriptions on our team and if you believe you qualify send mail to xmljobATmicrosoft.com

See you soon.

Categories: Life in the B0rg Cube

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Wednesday, 09 June 2004 - Dare Obasanjo's weblog