June 14, 2004
@ 09:53 AM

For the past few weeks my friend Chris has had an open invitation for me to play Settlers of Catan with him and couple of other guys in the U-district. Today I finally accepted and when I got there it turned out that one of the guys was Evan Martin, one of the devs for LiveJournal. It turns out Evan just graduated from college and this was his last weekend in Seattle before moving to the Bay Area to start work at Google. Once we were introduced he mentioned that he knew of me and in fact that I was the reason he unsubscribed from the atom-syntax mailing list. It seems in one of the early discussions about Atom I wrote something which he felt was a technically valid point but was delivered in a scathing manner (i.e. punctuated with a flame) so he decided to bow out of further discussions about "RSS with different tag names". This reminded me of a comment by Robert Sayre in Joshua's weblog

OTOH, your post was free of insults, hyperbole, and condescension. Dare is usually right when there is an actual technical issue, but we're talking politics

My level of exasperation with a lot of what was going on with the Atom effort made me more scathing than I tend to be in usual email discourse. This is one of the reasons I unsubscribed from the list but it seems I hurt a couple of people's feelings along the way. Sometimes it is easy to forget that the people on the other end of an email thread aren't former denizens of git.talk.flame who relish technical arguments spiced with flame. My apologies to any others that were as significantly affected by my comments.

Anyway, we all (Jag, Chris, Evan and I) played a three hour game of Catan while partaking of some of the nice Bourbon thoughtfully provided by Chris. Evan seems like he would have been a decent guy to talk to about blogging and syndication related technologies. I hope he enjoys his new job at Google.

TiVo calls...


 

Categories: Ramblings

June 13, 2004
@ 04:10 PM

I just found out that Lloyd Banks is about to drop an album, Hunger For More, all I can say is G-G-G-G-G-Unit. Cop that shit.

By the way if you haven't copped Twista's Kamikaze, you should. It's not as gangsta as Adrenaline Rush, instead its more radio friendly, but still off the chain. Almost every track sounds good enough to be a single, definitely all killer no filler. 


 

Categories: Ramblings

I recently read a post by a Jeff Dillon (a Sun employee) entitled .NET and Mono: The libraries were he criticizes the fact that the .NET Framework has Windows specific APIs. Specifically he writes

Where this starts to fall apart is with the .NET and Mono libraries. The Java API writers have always been very careful not to introduce an API which does not make sense on all platforms. This makes Java extremely portable at the cost of not being able to do native system programming in pure Java. With .NET, Microsoft went ahead and wrote all kinds of APIs for accessing the registry, accessing COM objects, changing NTFS file permissions, and other very windows specific tasks. In my mind, this immediately eliminates .NET or Mono from ever being a purely system independent platform.

While I was still digesting his comments and considering a response I read an excellent followup by Miguel De Icaza in his post On .NET and portability where he writes

First lets state the obvious: you can write portable code with C# and .NET (duh). Our C# compiler uses plenty of .NET APIs and works just fine across Linux, Solaris, MacOS and Windows. Scott also pointed to nGallery 1.6.1 Mono-compliance post which has some nice portability rules.
...
It is also a matter of how much your application needs to integrate with the OS. Some applications needs this functionality, and some others do not.

If my choice is between a system that does not let me integrate with the OS easily or a system that does, I personally rather use the later and be responsible for any portability issues myself. That being said, I personally love to write software that takes advantage of the native platform am on, specially on the desktop.

At first I was confused by Jeff's post given that it assumes that the primary goal of the .NET Framework is to create a Write Once Run Anywhere platform. It's been fairly obvious from all the noise coming out of Redmond about WinFX that the primarily goal of the .NET Framework is to be the next generation Windows programming API which replaces Win32. By the way check out the WinFX overview API as JPG or WinFX API Overview as PDF.  Of course, this isn't to say that Microsoft isn't interested in creating an interoperable managed platform which is why there has been ECMA standardization of C#, the Common Language Infrastructure (CLI) and the Base Class Library (BCL). The parts of the .NET Framework that are explicitly intended to be interoperable across platforms are all parts of the ECMA standardization process. That way developers can have their cake and eat it too. A managed API that takes full advantage of their target platform and a subset of this API which is intended to be interoperable and is standardized through the ECMA process.

Now that I think about it I realize that folks like Jeff probably have no idea what is going on in .NET developer circles and assume that the goals of Microsoft with the .NET Framework are the same as that of Sun with Java. That explains why he positions what many see as a flaw of the Java platform as a benefit that Microsoft has erred in not repeating. I guess one man's meat is another man's poison.  


 

Categories: Technology

In a recent post entitled 15 Science Street Tim Bray, one of the inventors of XML, writes

Microsoft’s main talking point (I’m guessing here from the public documents) was that their software and format had the advantage that in WordML you can edit documents from arbitrary schemas.

Our pushback on that was that editing arbitrary-schema documents is damn hard and damn expensive and has never been anything more than a niche business.

which seems not to jibe with my experiences. Many businesses have XML formats specific to their target industry (LegalXML, HR-XML, FpML, etc) and many businesses use office productivity suites to create and edit documents. It seems very logical to expect that people would like to use their existing spreadsheet and word processing applications to edit their business documents instead of using XMl editors or specialized tools. More interestingly Tim Bray contradicts his position that editing user-defined schemas is a niche scenario when he writes

As we were winding up, a couple of really smart people (don’t know who they were) put up their hands and asked real good questions. The best was essentially “What would you like to see happen?” After some back and forth, I ended up with “You should have the right to own your own information. It’s your intellectual capital and you worked hard to produce it for your citizens. Sun doesn’t own it, Microsoft doesn’t own it, you own it, and that means it should be living in a nice, long-lived, non-proprietary data format that isn’t anyone’s competitive weapon.”

He took the words right out of my mouth. This is exactly what Microsoft has done with Office 2003 by allowing users to edit documents in XML formats of their choosing. In the letter Bringing the XML Vision to the Desktop with Office 2003 written by Jean Paoli of Microsoft (also a co-inventor of XML) he writes

an even greater and more innovative benefit is the fact that companies can now create their own XML schemas specific to their business, define the structure and type of data that each data element in a document contains and exchange information with customers and business partners more easily. This capability opens up a whole new realm of possibilities, not only for end users, but also for the business itself because now organizations can capture and reuse critical information that in the past has been lost or gone unused. 

Office 2003 is a great step forward in enabling businesses and end users harness the power of XML in typical document interchange scenarios. Arguments about whether you should use Sun's XML format or Microsoft's XML format aren't the point. The point is which tools allow you to use your XML format with the most ease.

 

 


 

Categories: XML

I recently wrote that I want to make RSS Bandit compete more with commercial aggregators which elicited a comment about what exactly this means. Primarily it means that it is my intention that we should support what I consider are the three primary differentiating features of the commercial desktop aggregators I've seen (NetNewsWire, FeedDemon and NewzCrawler). The features are

  1. Newspaper Views: FeedDemon has the ability to display news items in a newspaper view which is a feature that Torsten batted around a few months ago but decided not to do because we didn't think it was that useful. However now that I read a number of feeds that tend to publish 30 - 50 items a day, being able to view the entries in a single page actually would be useful. My goal is for this feature to be 100% compatible with FeedDemon newsjpaper views meaning that you can use existing FeedDemon newspapers such as Radek's newspaper views for FeedDemon with RSS Bandit.

  2. WYSIWYG Weblog Editor: This feature was on my old RSS Bandit wishlist but I never got around to implementing it because of my displeasure with the MetaWeblog API. I've been waiting for the Atom project to produce a SOAP based API with built in authentication that would be widely supported by blogging tools before implementing this feature but it is now clear that such a specification won't be finalized anytime soon.  Since I don't do much GUI work I'll definitely need help from either Torsten or Phil with getting this done.

  3. NNTP Support: The promise of providing a uniform interface to various discussion forums whether they are Web based discussions exposed via RSS or in USENET is too attractive to pass up.

Of course, we will also fix the various bugs and respond to the various feature requests we've gotten from a number of our users. Torsten is currently on vacation and I'll most likely be gone for a week later on this month so development probably won't start in earnest until next month. Until then keep your feedback coming and thanks a lot for using RSS Bandit.


 

Categories: RSS Bandit

Chris Sells has announced the call for speakers for the Applied XML Developers Conference 5. From his post

Are you interested in presenting a 45-minute talk on some applied XML or Web Services topic? It doesn't matter which platform or OS you're targeting. It also doesn't matter whether you're an author or vendor or professional speaker or a developer in the trenches (in fact, I tend to be biased towards the latter). We're after interesting and unique applications of XML and Web Services technology and if you're doing good work in that area, then I need you to send me a session topic and 2-4 sentence abstract along with a little bit about yourself. I'll be taking submissions 'til the end of June, but don't delay...

...the conference itself is likely to be in Oregon during the 2nd or 3rd week of September, 2004, but we're still working the details out. One of the fun things that we're thinking about this year is to have the Dev.Conf. in Sunriver, Oregon, a resort and spa town in central Oregon where sun is plentiful and rain is scarce.

Previous XML DevCons have had a wide variety of interesting speakers. Unfortunately, the XML DevCon webpage doesn't provide any information on previous conferences. If you are interested in reports on last year's conference just type "XML DevCon" in your favorite Web search engine to locate blog postings from some of the attendees.

I probably won't be at this conference since the focus is usually XML Web Services while my professional interests are in core XML technologies with working with XML syndication formats being a hobby. However there should be lots of interesting presentations on XML Web Services and other leading edge applications of XML from industry experts if last year's conference is anything to go by.


 

Categories: XML

June 8, 2004
@ 09:22 AM

Jon Udell has started a series of blog posts about the pillars of Longhorn.  So far he has written Questions about Longhorn, part 1: WinFS and Questions about Longhorn, part 2: WinFS and semantics which ask the key question "If the software industry and significant parts of Microsoft such as Office and Indigo have decided on XML as the data interchange format, why is the next generation file system for Windows basically an object oriented database instead of an XML-centric database?" 

I'd be very interested in what the WinFS folks like Mike Deem would say in response to Jon if they read his blog. Personally, I worry less about how well WinFS supports XML and more about whether it will be fast, secure and failure resistant. After all, at worst WinFS will support XML as well as a regular file system does today which is good enough for me to locate and query documents with my favorite XML query language today. On the other hand, if WinFS doesn't perform well or shows the same good-idea-but-poorly-implemented nature of the Windows registry then it'll be a non-starter or much worse a widely used but often cursed aspect of Windows development (just like the Windows registry).

As Jon Udell points out the core scenarios touted for the encouraging the creation of WinFS (i.e search and adding metadata to files) don't really need a solution as complex or as intrusive to the operating system as WinFS. The only justification for something as radical and complex as WinFS is if Windows application developers end up utilizing it to meet their needs. However as an application developer on the Windows platform I primarily worry about three major aspects of WinFS. The first is performance, I definitely think having a query language over an optimized store in the file system is all good but wouldn't use it if the performance wasn't up to snuff. Secondly I worry about security, Longhorn evangelists like talking up what a wonderful world it would be if all my apps could share their data but ignore the fact that in reality this can lead to disasters. Having multiple applications share the same data store where one badly written application can corrupt the entire store is worrisome. This is the fundamental problem with the Windows registry and to a lesser extent the cause of DLL hell in Windows. The third thing I worry about is that the programming model will suck. An easy to use programming model often trumps almost any problem. Developers prefer building distributed applications using XML Web Services in .NET to the alternatives even though in some cases this choice leads to lower performance. The same developers would rather store information in the registry than come up with a robust alternative on their own because the programming model for the registry is fairly straightforward.

All things said, I think WinFS is an interesting idea. I'm still not sure it is a good idea but it is definitely interesting. Then again given that WinFS assimilated and thus delayed a very good idea from shipping, I may just be a biased SOB.

PS: I just saw that Jeremy Mazner posted a followup to Jon Udell's post entitled Jon Udell questions the value and direction of WinFS where he wrote

XML formats with well-defined, licensed schemas, are certainly a great step towards a world of open data interchange.  But XML files alone don't make it easier for users to find, relate and act on their information. Jon's contention is that full text search over XML files is good enough, but is it really?  I did a series of blog entries on WinFS scenarios back in February, and I don't think's Jon full text search approach would really enable these things. 

Jeremy mostly misses Jon's point which is aptly reduced to a single question at the beginning of this post. Jon isn't comparing full text search over random XML files on your file system to WinFS. He is asking why couldn't WinFS be based on XML instead of being an object oriented database.


 

Categories: Technology | XML

June 6, 2004
@ 04:41 AM

Tim Bray has a post entitled Whiskey-Bar Economics where he writes

As an added bonus, in the comments someone has posted a pointer to this, which (if even moderately accurate) is pretty astounding.

I'm not sure what is pretty astounding about CostOfWar.com. The Javascript on the site seems pretty basic, the core concept behind the site is opportunity cost which is explained in freshman economics class of the average college or university and the numbers from the site actually seem to be lowballed considering all the headlines I seem to read every month about the Bush administration requesting another couple of billion for the Iraq effort. For example, according to a USA Today article entitled Bush to request $25 billion for Iraq war costs, the US congress had already approved $163 billion for the War on Iraq when the another request for $25 billion showed up. Yet at the current time CostOfWar.com, claims that the war has cost $116 billion.

On the other hand, I think this is pretty astounding.


 

Categories: Ramblings

June 6, 2004
@ 04:18 AM

One of my friends, Joshua Allen, is a fan of RDF and Semantic Web technologies. Given that I respect his opinion a lot I keep trying to delve into RDF and its family of technologies every couple of months to see what it provides to the world of data access and information interchange above and beyond existing technologies. Recently I discovered that there are some in the RDF camp that position it as a "better XML". The first example of this I saw was an old article by Tim Berners-Lee entitled Why RDF model is different from the XML model. According to Tim the note is an attempt to answer the question, "Why should I use RDF - why not just XML?". However instead of answering the question his note just left me with more questions than answers. The pivotal point for me in Tim Berners-Lee's note is the following excerpt

Things you can do with RDF which you can't do with XML include

  • You can parse the semantic tree, which end up giving you a set of (possibly mutually referential) triples and then you can use the ones you want ignoring the ones you don't understand.

Problems with basing you understanding on the structure include

  • Without having gone to the trouble of getting the schema, or having an application hand-programmed to recognise a particular document type, you can't pick up any semantic information from a document;
  • When an XML schema changes, it could typically introduce new intermediate elements (like "details" in the tree above or "div" is HTML). These may or may or may not invalidate any query which has been based on the structure of the document.
  • If you haven't gone to the trouble of making a semantic model, then you may not have a well defined one.

It seems that the point being argued is that with RDF you can get more understanding of the information in the document than with just XML. Being that one could consider RDF as just a logical model layered on top of an XML document (e.g. RDF/XML) I find it hard to understand how viewing some XML document through RDF colored glasses buys one so much more understanding of the data.

Recently I discovered a presentation entitled REST, Self-description, and XML by Mark Baker. This presentation discusses the ideas in Tim Berners-Lee's note in more depth and in a way I finally understand. The first key idea in Mark's presentation is the notion of "self describing" data formats which were also covered in Tim Berners-Lee's presentation at WWW2002 entitled Specs Count. The core tennets of "self describing" data formats are covered in slide 10 and slide 11 of Mark's presentation. A "self describing" data formats contains all the data needed to figure out how to process the format from publically accessible specs. For example, an HTTP response tells you the MIME type of the document which can be used to locate the appropriate RFC which governs how the format should be processed. In the case of XML, Tim Berners-Lee states that an HTTP response which returns an XML document either as application\xml or text\xml should be processed according to the rules of the XML and XML namespaces recommendations which state that the identity of an element is determined based on its namespace name. So when processing an XML document, Tim asserts that it is self describing because one can locate the spec for the format from the namespace URI of the root element. Of course, Mark disagrees with this but his reasons for doing so is pedantic spec lawyering. I disagree with it as well but for different reasons. The main reason I disagree with it is because it puts a stake in the ground and says that any XML format on the Web that doesn't use namespace name for its root element or whose namespace name is not a dereferenceable URI that leads to a spec is broken. This automatically states that XML formats used on the Web today such as RSS 1.0, RSS 2.0, OPML and the Atom 0.3 syndication format are broken.

Mark then goes on to state in slide 20 that a problem with XML formats is that one can't arbitrarily extend an XML document without it's schema or without breaking some application somewhere. It's unclear as to what he means by the document's schema but will grant that it is likely that arbitrary additions to the expected content of an XML document will break certain applications. Getting to slide 24, it is slightly clearer what Mark is getting at. He claims that one although one can add extend a format by adding extra elements from a known namespace using just XML technologies this doesn't tell you how to deal with the extensions. On the other hand, with RDF the extensions are all concepts named with a URI whose meaning can then be looked up using HTTP GET. This is where he lost me. I don't see the difference between seeing a namespaced XML element in an XML format and using HTTP GET on the namespace URI of the element to locate the spec or schema for the namespaced extension and what he describes as the gains of using RDF.

The more I look at how RDF people bag on XML the more it seems that they don't really write applications in today's world. Almost every situation I've seen someone claim that RDF technologies will in the future be able to solve a problem XML cannot, the problem is actually not only solveable with XML technologies but actually is being solved using XML technologies today.  


 

Categories: XML

One of the more annoying aspects of writing Windows applications using the .NET Framework is that eventually you brush up against the limitations in the APIs provided by the managed classes and end up having to use interop to talk to Win32 or COM-based APIs. This process typically involves exposing native code in a manner that makes them look like managed APIs when in fact they are not. When there is an error in this mapping it results in hard-to-track memory corruption errors. All of the fun of annoying C and C++ memory corruption errors in the all new, singing and dancing .NET Framework.

Most recently we were bitten by this in RSS Bandit and probably would never have tracked this problem down if not for a coincidence. As part of forward compatibility testing at Microsoft, a number of test teams run existing .NET applications on current builds of future versions of the .NET Framework. One of these test teams decided to use RSS Bandit as a test application. However it seemed they could never get RSS Bandit to start without the application crashing almost instantly. Interestingly, it crashed at different points in the code depending on whether one compiled and ran the application on the current build of the .NET Framework or just ran an executable compiled against an older version of the .NET Framework on the current build. Bugs were filed against folks on the CLR team and the problem was tracked down.

It turns out that our declaration of the STARTUPINFO struct obtained from PInvoke.NET was incorrect. Specifically the following fields which were declared as

 [MarshalAs(UnmanagedType.LPWStr)] public string  lpReserved;
 [MarshalAs(UnmanagedType.LPWStr)] public string  lpDesktop;
 [MarshalAs(UnmanagedType.LPWStr)] public string  lpTitle;

were declared incorrectly. We should have declared them as

public IntPtr lpReserved; 
public IntPtr lpDesktop; 
public IntPtr lpTitle;

The reason for not declaring them as strings is that the Interop marshaler, after having converted the string to a managed string, will release the native data using CoTaskMemFree. This is clearly not the right thing to do in this case so we need to declare the fields as IntPtrs and then manually marshal them to strings via the Marshal.PtrToStringUni() API.

The problems with errors that occur due to such memory corruption issues is that their results are unpredictable. Some users may never witness a crash, while others witness the crash when their machines are under memory pressure or in some cases it crashes right away. Of course, the crash is never in the same place twice. Not only do these problems waste lots of developer time trying to track them down they lalso lead to negative user experience with the target application.

Hopefully, when Longhorn ships and introduces WinFX this class of problem will become a thing of the past. In the meantime, I need to spend some time going over our code that does all the Win32 interop to ensure that there are no other such issues waiting to rear their head.


 

Categories: Technology