I just spotted the following on the wiki Ward Cunningham set up requesting advice as a new hire to Microsoft.

Take a running start and don't look back

  1. Recognize that your wonderful inventiveness is the most valuable thing you will own in a culture that values its employees solely by their latest contributions. In a spartan culture like this, you will rise quickly.

  2. Keep spewing ideas, even when those ideas are repeatedly misunderstood, implemented poorly, and excised from products for reasons that have nothing to do with the quality of the idea. When you give up on communicating your new ideas, you will just go insane waiting to vest.

  3. Be patient, or better yet, don't even look back. Don't try to track and control what people do with your ideas. It will just make you jaded and cynical. (Like many of us who have gone before :)

  4. Communicate by writing things down in compact and considered form. The most senior people, who can take your ideas the furthest fastest, are very busy. As an added side-benefit, when random program managers who just don't get it come around for the fortieth time, begging for explanations, you can provide them references to your wiki, blog, or papers for the thirty-seventh time.

  5. Don't count on the research division for anything but entertaining politics.

Have a good time, and as Don said, plan for the long-haul!

I've been in the B0rg Cube just shy of two years but the above advice rings true in more ways than one. It is a very interesting culture and with the wrong attitude one could end up being very cynical. However as with all things, the best thing to do is learn how the system works and learn how to work it. The five points above are a good starting point.   
 

Categories: Life in the B0rg Cube

There were a number of sessions I found particularly interesting either because they presented novel ways to utilize and process XML or because they gave an insightful glance at how others view the XML family of technologies. 

Imperative Programming with Rectangles, Triangles, and Circles - Erik Meijer
This was a presentation about a research language called Xen that experiments with various ways to reduce the Relational<->Objects<->XML (ROX) impedance mismatch by adding concepts and operators from the relational and XML (specifically W3C XML Schema) world into an object oriented programming language. The main thesis of the paper was that heavily used APIs and programming idioms eventually tend to be likely candidates for including into the language. An example was given with the foreach operator in the C# language which transformed the following regularly used idiom

IEnumerator e = ((IEnumerable)ts).GetEnumerator();
  try {
     while(e.MoveNext()) {  T t = (T)e.Current; t.DoStuff(); }
  } finally {
     IDisposable d = e as System.IDisposable;
     if(d != null) d.Dispose();
  }

into

foreach(T t in ts){ 
  t.DoStuff();  
 }

The majority of the presentation was about XML integration. Erik spent some time talking about the XML to object impedance mismatch and how cumbersome programming with XML could be.  Either you wrote a bunch of code for walking trees manually or you queried nodes with XPath but then you are embedding one language into another and don't get type safety, etc (if there is an error in my XPath query I can't tell until runtime). He pointed out that various XML<->object mapping technologies fall short because they either don't map a rich enough set of W3C XML Schema constructs to relevant object structures but even if they did one now looses the power of being able to do rich XPath queries or XSLT/XQuery transformations. The XML integration in Xen basically came in 3 flavors; the ability to initialize classes from XML strings, support for W3C XML Schema constructs like union types and  sequences into the language and the ability to do XPath-like queries over the contents fields and properties of a class.

There were also a few other things like adding the constraint "not null" into the language (which would be a handy modifier for parameter names in any language given how often one must check parameters for null in method bodies) and the ability to apply the same method to all the members of a collection which seemed like valuable additions to a programming language independent of XML integration.

Thinking about it I am unsure of the practicality of some features such as being able to initialize objects from an XML literal in the code especially since Xen only supported XML documents with schemas although in some cases I could imagine such an approach being more palatable than using XQuery or XSLT 2.0 for constructing or querying strongly typed XML documents. Also I was suspicious of the usefulness of being able to do wildcard queries (i.e. give me all the fields in class Foo) although this could potentially be used to get the string value of an XML element with mixed content.

The language also had integrated SQL like querying with a "select" operator but I didn't pay much attention to this since I was only really interested in XML.

The meat of this presentation is available online in the paper entitled Programming with Circles, Triangles and Rectangles. The presentation was well received although sparsely attended (about two or three dozen people) and the most noteworthy feedback was that from James Clark who was so impressed he kept saying "I'm speechless" in between asking questions about the language. Sam Ruby was also impressed by the fact that not only was there a presentation but the demo which involved compiling and running various samples showed that this you could implement such a language in the CLR and even integrate it into Visual Studio.

Namespace Routing Language (NRL) - James Clark
This was a presentation for a language for validating a single XML document with multiple schemas simultaenously. This was specifically aimed at validating documents that contained XML from multiple vocabularies (e.g. XML content embedded in a SOAP envelope, RDF embedded in HTML, etc).

The core processing model of NRL is that it divides an XML document into sections each containing elements from a single namespace then each section can be validated using the schema for its namespace. There is no requirement that the same schema language is used so one could validate one part of the document using RELAX NG and use W3C XML Schema for another. There also was the ability to to specify named modes like XSLT which allowed you to match against element names against a particular schema instead of just keying off the namespace name. This functionality could be used to validate interleaved documents (such as XHTML within an XSLT stylesheet) but I suspect that this will be easier said than done in practice.

All in all this was a very interesting talk and introduced some ideas I'd never have considered on my own.  

There is a spec for the Namespace Routing Language available online.


 

Categories: XML

December 16, 2003
@ 05:33 PM

The XML 2003 conference was a very interesting experience. Compared to the talks at XML 2002 I found the talks at XML 2003 to be of more interest and relevance to me as an developer building applications that utilize XML. The various hallway and lunchtime conversations I had with  various people were particularly valueable. Below are the highlights from the various conversations I had with some XML luminaries at lunch and over drinks. Tomorrow I'll post about the various talks I attended.

CONVERSATIONS
James Clark: He gave two excellent presentations, one on his Namespace Routing Language (NRL) and the other about some of implementation techniques used in his nxml-mode for Emacs. I asked whether the fact that he gave no talks about RELAX NG meant that he was no longer interested in the technology. He responded that there wasn't really anything more to do with the language besides shepherd it through the standardization process and evangelization. However given how entrenched support for W3C XML Schema was with major vendors evangelization was an uphill battle.

I pointed out that at Microsoft we use XML schema language technologies for two things;

    1. Describing and enforcing the contract between producers and consumers of XML documents: .
    2. Creating the basis for processing and storing typed data represented as XML documents:

The only widely used XML Schema language that fit the bill for both tasks is W3C XML Schema. However W3C XML Schema is too complex and yet doesn't have enough features for the former and has too many features which introduce complexity for the latter case. In my ideal world, people would use something like RELAX NG for the former and XML-Data Reduced (XDR) for the latter. James asked if I saw value in creating a subset of RELAX NG which also satisfied the latter case but I didn't think that there would be compelling argument for people who've already baked W3C XML Schema into the core of their being (e.g. XQuery, XML Web Services, etc) to find interest in such a subset.

In fact, I pointed out that in designing for Whidbey (next version of the .NET Framework) we originally had designed the architecture to have a pluggable XML type system so that one could potentially generate Post Schema Validation Infosets (PSVI) but realized that this was a case of YAGNI. First of all, only one XML schema language exists that can generate PSVIs so creating a generic architecture makes no sense if there was no other XML schema language that could be plugged in to replace W3C XML Schema. Secondly, one of the major benefits of this approach I had envisioned was that one would be able to plug their own type systems into XQuery. This turned out to be more complicated than I thought because XQuery has W3C XML Schema deeply baked into it and it would take more than genericizing at the PSVI level to make it work (we'd also have to genericize operators, type promotion rules, etc) and once then once all that effort would have been expended any language that could be plugged in would have to act a lot like W3C XML Schema anyway.  Basically if some RELAX NG subset suddenly came into existence, it wouldn't add much to that we don't already get from W3C XML Schema (except less complexity but you could get the same from coming up with a subset of W3C XML Schema or following my various W3C XML Schema Best Practices articles on XML.com).

I did think that there would be some value to developers building applications on Microsoft platforms who needed more document validation features than W3C XML Schema in having access to RELAX NG tools. This would be nice to have but isn't a showstopper preventing development of XML applications on Microsoft platforms (translation: Microsoft won't be building such tools in the forseeable future). However if such tools existed I definitely would evangelize them to our users who needed more features than W3C XML Schema provides for their document validation needs.  

Sam Ruby: I learned that Sam is on one of "emerging technologies" groups at IBM. Basically he works on stuff that's about to become mainstream in big way and helps them along the way. In the past this has included PHP, Open Source and Java (i.e. the Apache project), XML Web Services and now weblogging technologies. Given his track record I asked him to give me a buzz whenever he finds some new technology to work on. : )

I told him that I felt syndication formats weren't the problem with weblogging technologies and he seemed to agree but pointed out that some of the problems they are trying to solve with ATOM make more sense in the context of using the same format for your blog editing/management API and archival format. There were also the various interpersonal conflicts & psychological baggage which needs to be discarded to move the technology forward and a clean break seems to be the best way. On reflection, I agreed with him.

I did point out that the top 3 problems I'd like to fix in syndication were one click subscription, subscription harmonization and adding calendar events to feeds. I mentioned that I should have RFCs for the first two written up over the holidays but the third is something I haven't thought about hard. Sam pointed out that instead of going the route of coming up with a namespaced extension element to describe calendar events in an RSS feed that perhaps a better option is the ATOM approach that uses link tags. Something like

   <link type="text/calendar" href="...">

In fact he seemed to have liked this idea so much it ended up in his presentation.

As Sam and I were finishing our meals, Sam talked about the fact that the effect that blogging has had on his visibility is immense. Before blogging he was well known in tight-knit technical circles such as amongst the members of the Apache project but now he knows people from all over the world working at diverse companies and regularly has people go "Wow, you're Sam Ruby, I read your blog". As he said, this the guy sitting across from us at the table said "Wow, you're Sam Ruby, I read your blog", Sam turned to me and said "See what I mean?"

The power of blogging...

Eve Maler: I spoke to her about a talk I'd seen on UBL given by Eduardo Gutentag and Arofan Gregory where they talked about the benefits of using the polymorphic features of W3C XML Schema to good use in business applications. The specific scenario they described was the following

Imagine a small glue supplier that provides glue to various diverse companies such as a shoe manufacturer, an automobile manufacturer and an office supplies company. This company uses UBL to talk to each of its customers who also use UBL but since the types for describing purchase orders and the like are not specific enough for them they use the type derivation features of W3C XML Schema to create specific types (e.g. a hypothetical LineItem type from UBL is derived to AutomobilePart or ShoeComponent by the various companies). However the small glue company can handle all the new types with the same code if they use type aware processing such as the following path XPath 2.0 or XQuery expression  which matches all instances of the LineItem type

element(*, LineItem)

The presenters then pointed out  that there could be data loss if one of the customers extended the LineItem type by adding information that was pertinent to their business (e.g. priority, pricing information, prefeerred delivery options, etc) since such code would not know about the extensions.

This seems like a horrible idea and yet another reason why I view all the "object oriented" features of W3C XML Schema with suspicion.

Eve agreed that it probably was a bad idea to recommend that people process XML documents this way then stated that she felt that calling such processing "polymorphic" didn't sit right with her since true polymorphism doesn't require subtype relationships. I agreed and disagreed with her. There are at least four types of polymorphism in programming language parlance and the kind used above is subtype polymorphism. This is just one of the four types of polymorphism (the others being coercion, overloading and parametric polymorphism) but the behavior above is polymorphism. From talking to Eve it seemed that she was more interested in parametric polymorphism because it subtype polymorphism is not a loosely coupled approach. I pointed out that just using XPath expressions to match on predicates could be considered to be parametric polymorphism since you are treating instances similarly even though they are of different types but satisfy the same constraints. I'm not sure she agreed with me. :)    

Jon Udell: We discussed the online exchange we had about WinFS types and W3C XML Schema types. He apologized if he seemed to be coming on too strong in his posts and I responded that of the hundreds of articles and blog posts I'd read about the technologies unveiled at the recent Microsoft Professional Developer's Conference (PDC) that I'd only seen two people provide insightful feedback; his was the first and Miguel de Icaza's PDC writeup was the second. 

Jon felt that WinFS would be more valuable as an XML database as opposed to an object oriented database (I think the terms he used were "XML store" and "CLR store") especially given his belief that XML enables the "Universal Canvas". I agreed with him but pointed out that Microsoft isn't a single entity and even though some parts may think that XML is one step closer to giving us a universal data interchange format and thus universal data access which there are others who see XML as "that format you use for config files" and express incredulity when they here about things like XQuery because they wonder why anyone would need a query language for their config files. :)

Reading Jon's blog post about Word 11, XML and the Universal Canvas it seems he's been anticipating a unified XML storage model for a while which explains his disappointment that the WinFS unveiled at PDC was not it.

He also thought that the fact that so many people at Microsoft were blogging was fantastic. 


 

Categories: XML

December 16, 2003
@ 06:52 AM

Robert Scoble writes

Here's what I'd do if I were at Harvard and in charge of the RSS spec:

1) Announce there will be an RSS 3.0 and that it will be the most thought-out syndication specification ever.

2) Announce that RSS 3.0 will ship on July 1, 2005. That date is important. For one, 18 months is long enough to really do some serious work. For two, RSS 3.0 should be positioned as "the best way to do syndication on Microsoft's Longhorn." ...

3) Open up a mailing list, a wiki, and a weblog to track progress on RSS 3.0 and encourage community inclusion.

4) Work with Microsoft to ensure that RSS 3.0 will be able to take advantage of Longhorn's new capabilities (in specific, focus on learning Indigo and WinFS)...

5) Make sure RSS 3.0 is simply the best-of-breed syndication protocol. Translation: don't let Microsoft or Google come up with a better spec that has more features.

I'm terribly amused by the fact that Robert Scoble likes to claim that he doesn't represent Microsoft in his blog then posts items where he basically acts like he does. An RSS 3.0 that competes with Atom is probably the worst possible proposal to resolve the current conflict in the website syndication space and a very clear indication that this is all about personality conflicts. The problem  with the Atom syndication format is that it is an incompatible alternative of RSS 1.0/RSS 2.0 which provides little if any benefit to content producers or news aggregators consumers. Coming up with another version of RSS doesn't change this fact unless it is backwards compatible and even then besides clarifications to the original spec I'm unsure what could be added to the core although I can think of a number of potential candidates. However this still would be a solution looking for a problem.

While talking to Tim Bray and Sam Ruby at XML 2003 last week I stated that a number of the problems with syndication have little to do with the core spec and most aggregator authors wouldn't consider any of the problems harped upon on the Atom lists as a big deal. The major problems with syndication today have little to do with the syndication format and more to do with it's associated technologies. 

As little interest I have in an Atom syndication format I have an order of magnitude less interest in a new version of RSS that exists solely to compete with Atom..

PS: Am I the only one who caught the trademark Microsoft arrogance (which really comes from working on Windows[0]) in Scoble's post? I especially liked

"Here's what I'd do if I were at Harvard and in charge of the RSS spec...Work with Microsoft to ensure that RSS 3.0 will be able to take advantage of Longhorn's new capabilities (in specific, focus on learning Indigo and WinFS). Build a prototype (er, have MSN build one) that would demonstrate some of the features of RSS 3.0 -- make this prototype so killer that it gets used on stage at the Longhorn launch

I literally guffawed out loud. So if Harvard doesn't tie RSS to Windows then all is lost? I guess this means that NetNewsWire and Straw should get ready to be left behind in the new Microsoft-controlled RSS future. Hilarious. 

[0] When you work on the most popular piece of software in the world you tend to have a different perspective from most other software developers in the world including within Microsoft.


 

Categories: Ramblings

December 15, 2003
@ 05:04 PM

James Robertson writes

Ed Foster points out that MS - like many other vendors - is forbidding benchmarks as part of their standard contracts:

Is it possible Microsoft has something to hide about the performance of its server and developer products? It's hard to escape that conclusion when you see how many of its license agreements now contain language forbidding customers of those products from disclosing benchmark results.

...
So what are MS and the other vendors afraid of?

I'm not sure what the official line is on these contracts but I've come to realize why the practice is popular among software vendors. A lot of the time people who perform benchmarks are familiar with one or two of the products they are testing and know how to tune those for optimal performance but not the others which leads to skewed results. I know that at least on the XML team at Microsoft we don't block people from publishing benchmarks if they come to us, we just ensure that their tests are apples-to-apples comparisons and not unfairly skewed to favor the other product(s) being tested.

Just a few days ago I attended a session at XML 2003 entitled A Comparison of XML Processing in .NET and J2EE where the speaker stated that push based XML parsers like SAX was more performant than pull-based XML parsers like the .NET Framework's XmlReader when dealing with large documents. He didn't give any details and implied that they were lacking because of the aforementioned EULA clauses.  Without any details, sample code or definition of what document size is considered "large" (1MB, 10MB, 100MB, 1GB?)  it's difficult to agree or disagree with his statement. Off the top of my head there aren't any inherrent limitations of pull-based XML parsing that come to mind that should make it perform less than push based parsing of XML documents although differences in implementations makes all the difference. I suspect that occurences like this are why many software  vendors tend to have clauses that limit the disclosure of benchmark information in their EULAs.

Disclaimer: The above is my personal opinion and is in no way, shape or form an official statement of the position of my employer.


 

Categories: Life in the B0rg Cube

I'm now experimenting with various Windows CVS clients to see which best suits my needs for RSS Bandit development. So far I have tried WinCVS which seems OK and I'm about to evaluate Tortoise CVS which Torsten seems very happy with.

Later on I'll experiment with CVS plugins that are integrated into Visual Studio such as Jalindi Igloo or the SCC Plugin for Tortoise CVS. I never got to use the Visual Studio plugin for GotDotNet workspaces when RSS Bandit was hosted there because the original IDE I started developing RSS Bandit (Visual C# 2002 standard edition) with did not support said plugin so I am curious as to what development with source repository access as part of the IDE feels like.


 

Categories: RSS Bandit

December 15, 2003
@ 04:32 PM

It seems my recent post about moving the RSS Bandit from GotDotNet Workspaces to SourceForge has lead to some discussion about the motivations for the move. I've seen this question asked on Daniel Cazzulino's weblog and on the RSS Bandit message board on GotDotNet. Below is my answer to the question phrased in the form of a top 10 list which I posted in response to the question on the GotDotNet message board and also sent to Andy Oakley

Top 10 reasons why we moved to SourceForge

1. Doesn't require people have Passport accounts to download the RSS Bandit installer.

2. We get download and page load statistics.

3. Bug reports can have file attachments. This is great since a lot of the time we end up wishing people would attach their error.log or feedlist.xml file with their bug reports.

4. We can get a mailing list if we want.

5. Separate databases for features vs. bugs.

6. Source code can be browsed over HTTP via ViewCVS without having to install any software

7. Larger quotas on how much you can store on their servers.

8. Bug tracker remembers your queries and the default query is more useful to me (all open bugs) than GDN's (all bugs assigned to me even closed ones).

9. Activity score more accurately reflects activity of the project (on GDN, BlogX is scored at having 99% activity score even though the project has been dead for all intents and purposes for several months).

10. With SourceForge we get to use the BSD licence.

I hope this satisfies the curiosity of those wondering why RSS Bandit moved to SourceForge. I've been using it for a few days and I'm already much happier with it despite some initial teething problems getting adding modules to CVS.


 

Categories: RSS Bandit

According to Reuters

WASHINGTON (Reuters) - A Pentagon (news - web sites) audit of Halliburton, the oil services firm once run by Vice President Dick Cheney (news - web sites), found the company may have overbilled the U.S. government by more than $120 million on Iraq (news - web sites) contracts, U.S. defense officials said on Thursday.

Why am I not surprised? This entire Iraq war fiasco will be the subject of much consternation and entertainment to future generations.


 

December 12, 2003
@ 12:19 PM

Today is the last day of the XML 2003 conference. So far it's been a pleasant experience.

XML IN THE REAL WORLD

Attendance at the conference was much lower than last year. Considering that last year Microsoft announced Office 2003 at the conference while this year there was no such major event, this is no surprise. I suspect another reason is that XML is no longer new and is now so low down in the stack that a conference dedicated to just XML is no longer that interesting. Of course, this is only my second conference so this level of attendance may be typical from previous years and I may have just witnessed an abnormality last year.

Like last year, the conference seemed targetted mostly at the ex-SGML crowd (or document-centric XML users) although this time there wasn't the significant focus on Semantic Web technologies such as topic maps that I saw last year. I did learn a new buzzword around Semantic Web technologies, Semantic Integration and found out that there are companies selling products that claim to do what until this point I'd assumed was mostly theoretical. I tried to ask one such vendor how they deal with some of the issues with non-trivial transformation such as the pubDate vs. dc:date example from a previous post but he glossed over details but implied that besides using ontologies to map between vocabularies they allowed people to inject code where it was needed. This seems to confirm my suspicions that in the real world you end up either using XSLT or reinventing XSLT to perform transformations between XML vocabularies. 

From looking at the conference schedule, it is interesting to note that some XML technologies got a lot less coverage in the conference  relative to how much discussion they cause in the news or blogosphere. For example, I didn't see any sessions on RSS although there is one by Sam Ruby on Atom scheduled for later this morning. Also there didn't seem to be much about XML Web Service technologies being produced by the major vendors such as IBM, BEA or Microsoft. I can't tell if this is because there was no interest in submitting such sessions or whether the folks who picked the sessions didn't find these technologies interesting. Based on the fact that a number of the folks who had "Reviewer" on their conference badge were from the old school SGML crowd I suspect the latter. There definitely seemed to be disconnect between the technologies covered during the conference and how XML is used in the real world in a number of cases.

MEETING XML GEEKS

I've gotten to chat with a number of people I've exchanged mail with but never met including Tim Bray, Jon Udell, Sean McGrath, Norm Walsh and Betty Harvey. I also got to talk to a couple of folks I met last year like Rick Jellife, Sam Ruby, Simon St. Laurent, Mike Champion  and James Clark. Most of the hanging out occurred at the soiree at Tim and Lauren's. As Tim mentions in his blog post there were a couple of "Wow, you're Dare?" or 'Wow, you're Sean Mcgrath?" through out the evening. The coolest part of that evening was that I got to meet Eve Maler who I was all star struck about meeting since I'd been seeing her name crop up as being one of the Über-XML geeks at Sun Microsystems since I was a programming welp back in college and I'm there gushing "Wow, you're Eve Maler" and she was like "Oh you're Dare? I read your articles, they're pretty good". Sweet. Since Eve worked at Sun I intended to give her some light-hearted flack over a presentation entitled UBL and Object-Oriented XML: Making Type-Aware Systems Work which was spreading the notion that the relying on the "object oriented" features of W3C XML Schema was a good idea then it turned out that she agreed with me. Methinks another W3C XML Schema article on XML.com could be spawned from this. Hmmmm.


 

Categories: XML

December 11, 2003
@ 03:42 PM

The new home of the RSS Bandit project is on SourceForge. Various things precipitated this move with the most recent being the fact that a Passport account was needed to download RSS Bandit from GotDotNet. I'd like to thank Andy Oakley for all his help with  GotDotNet Workspaces while RSS Bandit was hosted on there.

The most current release of RSS Bandit is still v1.2.0.61, you can now download it from sourceforge here. The source code is still available, and you can now browse the RSS Bandit CVS repository if interested in such things.


 

Categories: RSS Bandit