Since I wrote my What is Google Building? I've seen lots of interesting responses to it in my referrer logs. As usual Jon Udell's response gave me the most food for thought. In his post entitled Bloglines he wrote

Finally, I'd love it if Bloglines cached everything in a local database, not only for offline reading but also to make the UI more responsive and to accelerate queries that reach back into the archive.

Like Gmail, Bloglines is the kind of Web application that surprises you with what it can do, and makes you crave more. Some argue that to satisfy that craving, you'll need to abandon the browser and switch to RIA (rich Internet application) technology -- Flash, Java, Avalon (someday), whatever. Others are concluding that perhaps the 80/20 solution that the browser is today can become a 90/10 or 95/5 solution tomorrow with some incremental changes.
...
It seems pretty clear to me. Web applications such as Gmail and Bloglines are already hard to beat. With a touch of alchemy they just might become unstoppable.

This does seem like the missing part of the puzzle. The big problem with web applications (aka thin client applications) is that they cannot store a lot of local state. I use my primary mail readers offline (Outlook & Outlook Express) and I use my primary news aggregator (RSS Bandit) offline on my laptop when I travel or in meetings when I can't get a wireless connection. There are also lots of dial up users out there who don't have the luxury of an 'always on' broadband connection who also rely on the offline capabilities of such applications.

I suspect this is one of the reasons Microsoft stopped trying to frame the argument as thin client vs fat rich client. This discussion basically is arguing that an application with zero deployment and a minimalistic user interface is inferior to a desktop application that needs to be installed, updated and patched but has a fancier GUI. This is an argument that holds little water to most people which is why the popularity of Web applications has grown both on the Internet and on corporate intranets.

Microsoft has attempted to tackle this problem in two ways. The first attempt is to make rich client applications as easy to develop and deploy as web applications by creating a rich client markup language, XAML as well as the ClickOnce application deployment technology. The second is with better positioning by emphasizing the offline capabilities of rich clients and coming up with a new monicker for them, smart clients.

Companies that depend on thin client applications such as Google with GMail do realize their failings. However Google is in a unique position of being able to attract some very smart people who've been working on this problem for a while. For example, their recent hire Adam Bosworth wrote about technologies for solving this limitation in thin clients in a number of blog posts from last year; Web Services Browser, Much delayed continuation of the Web Services Browser and When connectivity isn't certain. The latter post definitely has some interesting ideas such as

the issue that that I want a great user experience even when not connected or when conected so slowly that waiting would be irritating. So this entry discusses what you do if you can't rely on Internet connectivity.

Well, if you cannot rely on the Internet under these circumstances, what do you do? The answer is fairly simple. You pre-fetch into a cache that which you'll need to do the work. What will you need? Well, you'll need a set of pages designed to work together. For example, if I'm looking at a project, I'll want an overview, details by task, breakout by employee, late tasks, add and alter task pages, and so on. But what happens when you actually try to do work such as add a task and you're not connected? And what does the user see.

To resolve this, I propose that we separate view from data. I propose that a "mobile page" consists both of a set of related 'pages' (like cards in WML), an associated set of cached information and a script/rules based "controller" which handles all user gestures. The controller gets all requests (clicks on Buttons/URL's), does anything it has to do using a combination of rules and script to decide what it should do, and then returns the 'page' within the "mobile page" to be displayed next. The script and rules in the "controller" can read, write, update, and query the associated cache of information. The cache of information is synchronized, in the background, with the Internet (when connected) and the mobile page describes the URL of the web service to use to synchronize this data with the Internet. The pages themselves are bound to the cache of information. In essence they are templates to be filled in with this information. The mobile page itself is actually considered part of the data meaing that changes to it on the Internet can also be synchronized out to the client. Throw the page out of the cache and you also throw the associated data out of the cache.

Can you imagine using something like GMail, Google Groups or Bloglines in this kind of environment? That definitely would put the squeeze on desktop applications.


 

Categories: Technology

About a week ago my article Designing Extensible, Versionable XML Formats appeared on XML.com. However due to a “pilot error” on my end I didn't send the final draft to XML.com. By the time I realized my mistake the article was already live and changing it would have been cumbersome since there were a few major changes in the article.

You can read the final version of the article Designing Extensible, Versionable XML Formats on MSDN. The main differences between the MSDN article and the XML.com one are

  1. Added sections on Message Transfer Negotiation vs. Versioning Message Payloads and Version Numbers vs. Namespace Names

  2. Added more content to the section Using XML Schema to Design an Extensible XML Format especially around usage of substitution groups, xsi:type and xs:redefine.

  3. Amended all sample schemas to use blockdefault="#all".

  4. Added an Acknowledgements section

  5. Schema in for section New constructs in a new namespace approach uses a fixed value instead of a default value for mustUnderstand attribute on isbn element.


 

Categories: XML

July 25, 2004
@ 12:30 AM

Recently there have been some complaints about duplicate entries showing up in RSS Bandit. This is due to a change I made in the most recent version of RSS Bandit. In RSS 2.0 there is an optional guid element that can be used to uniquely identify an item in an RSS feed. Unfortunately this element is optional so most aggregators end up using the link element instead in feeds that don't use guids. 

For the most part this worked fine. However I stumbled across a feed that used the same link for each item from a given day; the Cafe con Leche RSS feed. This meant that RSS Bandit couldn't differentiate between items posted on the same day. This was particularly important when tracking what items a user has read or whether an item has already been downloaded or not. I should have pinged the owner of the feed to point this problem out but instead I decided to code around this issue by using the combination of the link and title elements for uniquely identifying items. This actually turned out to be worse.

Although this fixed the problems with the Cafe con Leche RSS feed it caused other issues. This means that any time an item in a feed changed its title but kept the permalink the same (for example, if a typo was fixed in the title) then RSS Bandit thinks it's a different post and a duplicate entry shows up in the list view. Since popular sites like Boing Boing and Slashdot tend to do this almost every other day it means I turned a problem with a niche site that affects a few users to one that affects a number of popular websites thus affecting lots of users.

This problem will be fixed in the next version of RSS Bandit.


 

Categories: RSS Bandit

July 24, 2004
@ 08:51 PM

In the past couple of months Google has hired four people who used to work on Internet Explorer in various capacities [especially its XML support] who then moved to BEA; David Bau, Rod Chavez, Gary Burd and most recently Adam Bosworth. A number of my coworkers used to work with these guys since our team, the Microsoft XML team, was once part of the Internet Explorer team. It's been interesting chatting in the hallways with folks contemplating what Google would want to build that requires folks with a background in building XML data access technologies both on the client side, Internet Explorer and on the server, BEA's WebLogic.

Another interesting recent Google hire is Joshua Bloch. He is probably the most visible guy working on the Java language at Sun behind James Gosling. Based on recent interviews with Joshua Bloch about Java his most recent endeavors involved adding new features to the language that mimic those in C#.

While chomping on some cheap sushi at Sushi Land yesterday some friends and I wondered what Google could be planning next. So far, the software industry including my employer has been playing catchup with Google and reacting to their moves. According to news reports MSN is trying to catch up to Google search and Hotmail ups their free storage limit to compete with GMail. However this is all reactive and we still haven't seen significant competition to Google News, Google Image Search, Google Groups or even to a lesser extent Orkut and Blogger. By the time the major online networks like AOL, MSN or Yahoo! can provide decent alternatives to this media empire Google will have produced their next major addition.

So far Google doesn't seem to have stitched all its pieces into a coherent media empire as competitors like Yahoo! have done but this seems like it will only be a matter of time. What is of more interest to the geek in me is what Google could build next that could tie it all together. As Rich Skrenta wrote in his post the Secret Source of Google's Power

Google is a company that has built a single very large, custom computer. It's running their own cluster operating system. They make their big computer even bigger and faster each month, while lowering the cost of CPU cycles. It's looking more like a general purpose platform than a cluster optimized for a single application.

While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.

A friend of mine, Justin, had an interesting idea at dinner yesterday. What if Google ends up building the network computer? They can give users the storage space and reliability to run place all their data online. They can mimic the major desktop applications users interact with daily by using Web technologies. This sounds far fetched but then again, I'd have never imagined I'd see a free email service that gave 1GB of free email.

Although I think Justin's idea is outlandish but suspect the truth isn't much further from that.

Update: It seems Google also picked up another Java language guy from Sun. Neal Gafter who worked on various Java compiler tools including javac, javadoc and javap. Curiouser and curiouser.


 

Categories: Technology

During the most recent Download.Ject Internet Explorer incident [which was significant enough I saw newspaper headlines and TV news reports advicing people to switch browsers] I got some requests from RSS Bandit users to switch the browser used by RSS Bandit since they'd switched from using Internet Explorer due to security concerns.

Torsten and I looked around to see how feasible this would be and found the Mozilla ActiveX control which enables one to embed the Mozilla browser engine (Gecko) into any ActiveX application. The control implements the same APIs as the Internet Explorer control so it may be straightforward to make this change. 

I have some concerns about doing this.

  1. We've had weird interactions with COM interop between RSS Bandit and IE which result in weird bugs like dozens of IE windows being spawned and most recently memory corruption errors. I am wary of moving to an unknown quantity like Gecko and facing similar issues without the benefit of having a background of working with the component.

  2. There's a question of whether we replace our dependency on IE or ship an option to use Gecko instead of IE. Or whether we just ship a Gecko version and an IE version. The installer for the Mozilla ActiveX control is currently larger than the RSS Bandit download so we'd more than double the size of our download if we tied ourselves to Gecko.

I'm curious as to what RSS Bandit users think. Currently I don't think I'm going to add making such a switch to our plans but I am always interested in feedback from our users on what they think the right thing to do should be.


 

Categories: RSS Bandit

July 21, 2004
@ 07:15 AM

It seems every 3 months some prominent online publication complains about the amount of traffic RSS news readers cause to websites that provide RSS feeds. This time it is Slashdot with their post When RSS Traffic Looks Like a DDoS which references a post by Chad Dickerson, the CTO of Infoworld, entitled RSS growing pains. Chad writes

Several months ago, I spoke to a Web architect at a large media site and asked why his site didn’t support RSS. He raised the concern that thousands (or even millions) of dumb clients could wreak havoc on a popular Web site. Back when I was at CNN.com, I recall that our servers got needlessly pounded by a dumb client (IE4) requesting RSS-like CDF files at frequent intervals regardless of whether they had changed. As the popularity of RSS feeds at InfoWorld started to surge, I began to notice that most of the RSS clients out there requested and downloaded our feeds regardless of whether the feeds themselves had changed. At the time, we hadn’t quite reached the RSS tipping point, so I filed these thoughts away for later -- but “later” came sooner than I thought.

At this point I'd like to note that HTTP provides two mechanisms for web servers to tell clients if a network resource has changed or not. The basics of this mechanism is explained in the blog post HTTP Conditional Get for RSS Hackers which provides a way to prevent clients such as news readers from repeatedly downloading a Web document if it hasn't been updated. At this point I'd like to point out that at the current time, the InfoWorld RSS feed supports neither.

Another technique for reducing bandwidth consumption by HTTP clients is to use HTTP compression which greatly reduces the amount of data that has to be sent to a client when the feed has to be downloaded. For example, the current InfoWorld feed is 7427 bytes which shrinks to 2551 bytes when zipped using GZip on my home machine. This is a reduction by a factor of 3, on larger files the ratio of the reduced size to the original size is even better. Again, InfoWorld doesn't support this technique for reducing bandwidth consumption.

It is unsurprising that they are seeing significant bandwidth consumption from news aggregators. An RSS reader polling the InfoWorld site once an hour over an 8 hour period would download about 60 kilobytes of XML, on the other hand if they supported HTTP conditional GET requests and HTTP compression via GZip encoding this number would be under 3 kilobytes.  

The one thing that HTTP doesn't provide is a way for clients to deal with numerous connections being made to the site at once. However this problem isn't much different from the traditional scaling problem that web sites have to deal with today when they get a lot of traffic from regular readers.  


 

Today Arpan (the PM for XML query technologies in the .NET Framework) and I were talking about features we'd like to see on our 'nice to have' list for the Orcas release of the .NET Framework. One of the things we thought would be really nice to see in the System.Xml namespace was XPath 2.0. Then Derek being the universal pessimist pointed out that we already have APIs that support XPath 1.0 that only take a string as an argument (e.g. XmlNode.SelectNodes) so we'd have difficulty adding support for another version of XPath without contorting the API.

Not to be dissuaded I pointed out that XPath 2.0 has a backwards compatibility mode which makes it compatible with XPath 1.0. Thus we wouldn't have to change our Select methods or introduce new methods for XPath 2.0 support since all queries that used to work in the past against our Select methods would still work if we upgraded our XPath implemention to version 2.0. This is where Arpan hit me with the one-two punch. He introduced me to a section of the XPath 2.0 spec called Incompatibilities when Compatibility Mode is true which reads

The list below contains all known areas, within the scope of this specification, where an XPath 2.0 processor running with compatibility mode set to true will produce different results from an XPath 1.0 processor evaluating the same expression, assuming that the expression was valid in XPath 1.0, and that the nodes in the source document have no type annotations other than xdt:untypedAny and xdt:untypedAtomic.

I was stunned by what I read and I am still stunned now. The W3C created XPath 2.0 which is currently backwards incompatible with XPath 1.0 and added a compatibility mode option to make it backwards compatible with XPath 1.0 but it actually still isn't backwards compatible even when in this mode?  This seems completely illogical to me. What is the point of having a backwards compatibility mode if it isn't backwards compatible? Well, I guess now I know if we do decide to ship XPath 2.0 in the future we can't just add support for it transparently to our existing classes without causing some API churn. Unfortunate.

Caveat: The fact that a technology is mentioned as being on our 'nice to have' list or is suggested in a comment to this post is not an indication that it will be implemented in future versions of the .NET Framework.


 

Categories: XML

July 17, 2004
@ 02:40 AM

Dave Winer writes

Russ Beattie says we should be careful not to give the Republicans ammo to kill Kerry. I am sorry Russ, I'm not worried about that. I'm more worried that the Dems are too flustered by the hardball tacticts of the Reps to fight back.

The only time I tend to watch regular TV that isn't TiVo is while working out in the morning at the health club. I've noticed that while John Kerry's ads tend to be about the qualities that  make him a good candidate for president, George Bush's ads have mostly been negative ads attacking John Kerry. Personally I would love it if Kerry's campaign continues to take the high ground and shows the Republican party up for the rabid attack dogs that they are. The problem with this is that negative ads work and some people tend to look at not hitting back as a sign of weakness, which is what it seems Dave Winer is doing.

Whatever happened trying to change the tone in Washington and elevate the discourse? Just another case of "Do what I say, not what I do" I guess.

 


 

Categories: Ramblings

I was reading an XML-Deviant column on XML.com entitled Browser Boom when I came across the following excerpt

The inevitable association with Microsoft's CLI implementation is proving a source of difficulty for the Mono project. The principal author of Mono's XML support, Atsushi Eno, posted to the Mono mailing list on the problems of being conformant in Mono's XML parser implementation. More specifically, whose rules should Mono conform to. W3C or Microsoft?

MS XmlTextReader is buggy since it accepts XML declaration as element content (that violates W3C XML specification section 3 Logical Structures). ... However, there is another discussion that it is useful that new XmlTextReader (xmlText, XmlNodeType.Element, null) accepts XML declaration.

... that error-prone XmlTextReader might be useful (especially for people who already depends on that behavior)

... we did not always reject Microsoft badness; for example we are copying System.Xml.XmlCDataSection that violates W3C DOM interface hierarchy (!)

The root of the dilemma is similar to that which Mozilla and Opera are trying to manage in the browser world.

What I find interesting is that instead of pinging the MSFT XML folks (like myself) and filing a bug report this spawned a dozen message email discussion on whether Mono should be bug compatible with the .NET Framework. Of course, if the Mono folks decide to be bug compatible with this and other bugs in System.Xml and we fix them thus causing breaking changes in some cases will we see complaints about how Microsoft is out to get them by being backwards incompatible? Now that Microsoft has created the MSDN Product Feedback Center they don't even have to track down the right newsgroup or email address of a Microsoft employee to file the bug.

It's amazing to me how much work people cause for themselves and conspiracy theories they'd rather live in than communicate with others.

Update: I talked to developer responsible for the XmlTextReader class and she responded "This is by design. We allow XML declaration in XML fragments because of the encoding attribute. Otherwise the encoding information would have to be transferred outside of the XML and manually set into XmlParserContext."


 

Categories: Life in the B0rg Cube | XML

Taken from the an article on TheServerSide.com entitled Microsoft Responds to Sun’s Web Service Benchmarks

In a paper published last month, Sun claimed that Java based web services outperform .NET based web services both in throughput and response times. Microsoft has released a paper on TheServerSide.NET responding to those claims stating that Sun’s representation of the .NET performance was understated by 2 to 3 times and that in many, but not all cases, .NET exceeded the Java benchmarks.
...
Read the Microsoft response on TheServerSide.NET: Web Services Performance: Comparing J2EE and .NET

Read Sun's original paper: J2EE claimed to have better Web Services performance than .NET

It should be noted that Sun did not publish source code for their benchmark so Microsoft had to  re-create Sun's benchmark based on the details from the original paper. The Microsoft response has the full source code for both the .NET Web Service implementation, the Java Web Service implementation using Sun's JWSDP 1.4 along with the test program used to benchmark both services. I always believe the best way to verify a benchmark is to run it yourself. The performance of the .NET XML Web Service implementations should prove to be a lot better than what is implied by the original paper from Sun.


 

July 14, 2004
@ 09:14 PM

In the midst of a back and forth internal discussion on whether it is appropriate for folks to be griping about the recently announced MSFT benefit cuts on their work related blogs someone sent me a link to the Mini-Microsoft blog which describes itself thusly

Let's slim down Microsoft into a lean, mean, efficient customer pleasing profit making machine! Mini-Microsoft, Mini-Microsoft, lean-and-mean!

Subscribed!!!


 

A little while ago some members of our team experimented various ways to reduce the Relational<->Objects<->XML (ROX) impedance mismatch by adding concepts and operators from the relational and XML (specifically W3C XML Schema) world into an object oriented programming language. This effort was spear headed by a number of smart folks on our team including Erik Meijer, Matt Warren, Chris Lovett  and a bunch of others all led by William Adams. The object oriented programming language which was used as a base for extension was C#. The new language was once called X# but eventually became known as Xen.

Erik Meijer presented Xen at XML 2003 and I blogged about his presentation after the conference. There have also been two papers published about the ideas behind Xen; Programming with Rectangles, Triangles, and Circles and Unifying Tables, Objects and Documents. It's a new year and the folks working on Xen have moved on to other endeavors related to future versions of Visual Studio and the .NET Framework.

However Xen is not lost. It is now part of the Microsoft Research project, Cw (pronounced C-Omega). Even better you can download a preview of the Cw  compiler from the Microsoft Research downloads page


 

Categories: Technology | XML

Torsten has a blog post about an interesting bug in RSS Bandit. If you are subscribed to both Joe Gregorio and Ian Hixie's blogs then one of the entries in Ian Hixie's blog appears with the wrong date. The blog post that appears with the incorrect date is the post State of the WHAT from Ian Hixie's blog which is linked to from Joe Gregorio's post 3270 Redux. Instead of being dated 2004-06-29 as appears in Ian's RSS feed it is dated 2004-06-05 which is the same date as from Joe's post.

The problem arises from a workaround we came up with to deal with feeds that don't provide dates. Many users dislike feeds that don't have dates and prefer that we display some default date for such feeds. What we ended up doing was using the date the item was seen in the feed as the date for each item. In many cases this date isn't accurate. Sometimes the inaccuracy of this date is particularly glaring when a post from a feed with dates in it links to one with no dates in the feed because it may look like a feed is linking to a post in the future. For example, Joe Gregorio's post dated 2004-06-05 links to a post made by Ian Hixie on 2004-06-29. In this case, this is valid because Joe Gregorio went back and edited his blog post but didn't update the date in his feed. However RSS Bandit thinks this discrepancy in the dates is because we guessed the date for the entry in Ian's blog and thus corrects it by aligning it with the date from Joe's entry. The rationale for this behavior is that guessing that an undated entry was posted on the same day as someone linked to it is more accurate than guessing that it was posted when it was fetched. The bug is that when we use this heuristic we don't check to see if the entry whose date is being adjusted is actually an undated entry.

This has been fixed in the current codebase. The next question is whether we should actually be adjusting dates in this manner in any case.


 

Categories: RSS Bandit

July 11, 2004
@ 12:50 AM

In a post entitled Dare Obasanjo is raining on the W3C's parade, Mike Dierken responds to my recent post which asks Is the W3C Becoming Irrelevant? by writing

Either way the primary mechanism the W3C uses to produce technology specs is to take a bunch of contradictory and conflictiong proposals then have a bunch of career bureaucrats try to find some compromise that is a union of all the submitted specs

Damn those career bureaucrats that built XML. Or is it the SOAP design process that caused the grief? And where did that technology come from anyway?

My original post already described the specs that have caused grief and show the W3C is losing its way. I assume that Mike is trying to use XML 1.0 and SOAP 1.1 as counter examples to the trend I pointed out. Well first of all, XML 1.0 was a proposal to design a subset of SGML so by definition it could not suffer the same problems that face attempts to innovate by committee which have hampered the W3C in current times. Also when XML 1.0 was created it was much smaller and a majority of the participants in the subsetting of SGML had similar goals. As for SOAP 1.1, it isn't a W3C spec. SOAP 1.1 was created by Don Box, Dave Winer and a bunch of Microsoft and IBM folks and then submitted to the W3C as a W3C Note.

Of course, the W3C has created iterations of both specs (XML 1.1 & SOAP 1.2) which in both cases are backwards incompatible with the previous versions. I leave it as an excercise to the reader to decide if having backwards incompatible point releases of Web specifications is how one 'leads the Web to its full potential'.


 

Categories: XML

July 10, 2004
@ 06:09 PM

While reading Dave Winer's blog today I stumbled on a link to the New York Times editorial on the Sentate Intelligence Committee's recent report. Below is an excerpt

In a season when candor and leadership are in short supply, the Senate Intelligence Committee's report on the prewar assessment of Iraqi weapons is a welcome demonstration of both. It is also disturbing, and not just because of what it says about the atrocious state of American intelligence. The report is a condemnation of how this administration has squandered the public trust it may sorely need for a real threat to national security.

The report was heavily censored by the administration and is too narrowly focused on the bungling of just the Central Intelligence Agency. But what comes through is thoroughly damning. Put simply, the Bush administration's intelligence analysts cooked the books to give Congress and the public the impression that Saddam Hussein had chemical and biological weapons and was developing nuclear arms, that he was plotting to give such weapons to terrorists, and that he was an imminent threat.

These assertions formed the basis of Mr. Bush's justifications for war. But the report said that they were wrong and were not a true picture of the intelligence, and that the intelligence itself was not worth much. The freshest information from human sources was more than four years old. The committee said the analysts who had produced that false apocalyptic vision had fallen into a "collective groupthink" in which evidence was hammered into a preconceived pattern. Their bosses did not intervene.

The report reaffirmed a finding by another panel investigating intelligence failures before the 9/11 attacks in saying that there was no "established formal relationship" between Saddam Hussein and Al Qaeda. It also said there was no evidence that Iraq had been complicit in any attack by Osama bin Laden, or that Saddam Hussein had ever tried to use Al Qaeda for an attack. Although the report said the C.I.A.'s conclusions had been "widely disseminated" in the government, Mr. Bush and Vice President Dick Cheney have repeatedly talked of an Iraq-Qaeda link.

Sadly, the investigation stopped without assessing how President Bush had used the incompetent intelligence reports to justify war.

It is now quite clear that GW Bush and his cronies started a war that has claimed the lives of hundreds of Americans and thousands of Iraqis, cost the US and Iraq billions of dollars, and has increased the negative feelings towards the US across the world [especially in the Middle East] for no just cause. What I'd like to know is if anyone is going to go to is what the legal punishment for their transgressions actually will be.

Growing up in Nigeria, I saw first hand what happens when the government commits crimes against the people with no fear of accountability. Lack of accountability seeps into the national fabric and varying degrees of corruption follow. Hopefully, America won't follow the example of the tin pot dictatorships across the third world where everyone knows the governments lie and are corrupt but shrug it off as being a way of life.

Bush and his cronies are destroying America and everything it stands for one day at a time. I pray we don't get four more years of this disaster.


 

Categories: Ramblings

For a long time I used to think the W3C held the future of the World Wide Web in its hands. However I have come to realize that although this may have been true in the past the W3C has become too much of a slow moving bureaucratic machine to attract the kind of innovation that will create the next generation of the World Wide Web. From where I sit there are three major areas of growth for the next generation of the World Wide Web; the next generation of the dynamic Web, syndication and distibuted computing across the Web. With the recent decisions of Mozilla and Opera to form the WHAT working group and Atom's decision to go with the IETF it seems the W3C will not be playing a dominant role in any of these 3 areas.

In recent times the way the W3C produces a spec is to either hold a workshop where different entities can submit proposals and then form a working group based on coming up with a unification of the various proposals or forming a working group to find come up with a unification of various W3C Notes  submitted by member companies. Either way the primary mechanism the W3C uses to produce technology specs is to take a bunch of contradictory and conflictiong proposals then have a bunch of career bureaucrats try to find some compromise that is a union of all the submitted specs. There are two things that fall out of this process. The first is that the process takes a long time, for example the XML Query workshop was in 1998 and six years later the XQuery spec is still a working draft. Also XInclude proposal was originally submitted to the W3C in 1999 but five years later it is just a candidate recommendation. Secondly, the specs that are produced tend to be too complex yet minimally functionaly since they compromise between too many wildly differing proposals. For example, W3C XML Schema was screated by unifying the ideas behind DCD, DDML, SOX, and XDR. This has lead to a dysfunctional specification that is too complex for the simple scenarios and nigh impossible to use in defining complex XML vocabularies.

It seems many vendors amd individuals are realizing that the way to produce an innovative technology is for the vendors that will mostly be affected by the technology to come up with a specification that is satisfactory to the participants as opposed to trying to innovate by committee.  This is exactly what is happening with the next generation of the dynamic Web with the WHAT working group, with XML Web Services with WS-I and in syndication with RSS & Atom.

The W3C still has a good brand name since many associate it with the success of the Web but it seems that it has become damage that vendors route around in their bid to create the next generation of the World Wide Web.


 

Categories: XML

I often think to myself that there is a lot of background racism in the United States. By background racism, I mean racism that is so steeped into the culture that it isn't even noticed unless pointed out by outsiders. One example sprang to mind after reading Robert Scoble's post Did China beat Christopher Columbus by decades? where he writes

Speaking of Chinese, I'm reading a book "1421 The Year China Discovered America" that makes a darn good case that Christopher Columbus didn't discover America. He's done a ton of work that shows that the Chinese were actually here 60 years prior and that Christopher Columbus actually had copies of their maps!

That basically throws out a whole ton of history I learned in elementary school.

What I find interesting is this concept of "discovering America". There were already people on the North American content when Columbus [or the Chinese] showed up in the 15th century. So "discovered" really means "first European people to realize the American continent existed". Now every child in America is brought up to believe that Europeans showing up on some land that was already inhabited by natives is "discovering America" and introducing it to the world. 

This makes me wonder how much the history lessons I received growing up in Nigeria differs from the version British kids got about the African colonies. Perhaps there is also some white guy celebrated for having "discovered Africa" and civilizing the black savages who he met when he got there. At least whatever tribes that welcomed whoever he was aren't extinct today, too bad you can't say the same for the tribes that greeted Columbus.    


 

Categories: Ramblings

July 6, 2004
@ 09:50 AM

Whenever I'm up late and can't sleep I like checking out the HipHopMusic.com. I like reading the discussions, where else can I find a discussion of a lawsuit being filed against Snoop Doggy Dogg for using a voice mail message on his album detoriorate into gang bangers threatening to cap each other over the Web or watch a review of Jay-Z's most recent album turn into a six month long discussion about whether Jay-Z is the greatest of all time (G.O.A.T.) or not.

Speaking of hip hop the new album by 213 (Snoop, Nate Dogg and Warren G), The Hard Way, is scheduled to drop within the next month. I've been waiting over 10 years for this album. There is a God in heaven listening to our prayers. :)


 

July 6, 2004
@ 07:48 AM

To help both our users and the RSS Bandit development team I have produced an RSS Bandit Product Roadmap. From the introduction to the roadmap

As the user base and development team of RSS Bandit has grown the need for planning and strategy documentation for future versions of RSS Bandit has become self evident. This document describes the goals for future versions of RSS Bandit as well as provide a plan for achieving these goals. As RSS Bandit is an Open Source project worked on by members of the RSS Bandit community in their free time these plans and goals will not be fixed which is why this document is a living document available on the wiki. There are three primary goals of this road map

  • it is a way to communicate to our end users what features are planned for the next release
  • it communicates the prioritization of various features as well as defines owners for feature areas
  • it defines the requirements of each feature in enough detail that the RSS Bandit development team can implement these features

At the current time this road map describes the future of the current version of RSS Bandit (v1.2.0.114) and the following version. Since we do not know the what version number an RSS Bandit release will use until it is released we will use code names for future releases. The release following v1.2.0.114 is codenamed Wolverine after the character from the X-Men comic book.

This document is primarily for the benefit of Torsten, Phil and myself although I am sure our users will find it of interest as well. If you have any feedback on the roadmap, let me know by posting a comment.


 

Categories: RSS Bandit

July 4, 2004
@ 08:40 PM

In The Problem With Online Music Tim Bray writes

The New York Times today hits the nail on the head: if you’re buying music over the net, you’re buying it in severely damaged condition. When I plug my computer into the really good stereo at home, the difference between the way music sounds coming off CD or vinyl or a good FM signal, and the crippled version from MP3 compression isn’t subtle. I used to think that if you were listening to music on headphones on a bus or train or plane or in a crowd, the MP3 lossage really didn’t matter much.

Then there are people like me who have a Bose sound system in their car but find out much to our chagrin that MP3s playing off of an iPod sound better than CDs since the iPod has EQs and the car stereo does not.


 

In chapter 26, of The Dilbert Principle Scott Adams proposes guidelines for good management which should promote a healthy workplace. On pages 317 & 318 he writes

Rule for "one off" activities: consistency. Resist the urge to tinker. It's always tempting to "improve" the prganizational structure, or to rewrite the company policy to address a new situation, or to create committee to improve employee morale. Individually, all those things seem to make sense. But experience shows that you generally end up with something that is no more effective than what you started with.
...
The best example of a fruitless, "one off" activity that seems like a good idea is the reorganization. Have you ever seen an internal company reorganization that dramatically improved either the effectiveness of the employees or quality of the product?

I found this excerpt particularly relevant because at my day job [at Microsoft] reorganizations are a way of life. Based on conversations with coworkers and my experiences being in Redmond just over 2 years, the average product team goes through one reorganization a year. The average B0rg drone has the line between them and the CEO in the org chart altered at least once a year. At least one news publication has described it as Microsoft's Annual Spring Reorg. When I first got to Microsoft I'd heard people joke about this but having gone through 2 changes in management and 2 team reorgs in as many years the jokes don't seem as far fetched any more. 

About a year ago [or longer] there was a 'meet-the-CEO' style meeting where Steve Ballmer gave a talk followed by a Q & A session with employees in one of teh conference rooms. I didn't attend because I knew the room would be filled to capacity but did watch the live Webcast. One of the things he talked about was the negative effect of frequent reorgs. Paraphrasing, I believe he said with reorgs we take teams of people who've learned how to work together and function as a team and disrupt them by asking them to start the team building process all over again with new counterparts. Recently I noticed another detriment to regular reorganizations which involve changes of management.

Our team recently had a project post mortem where we discussed what things had gone right or wrong on the journey towards getting to beta 1 of v2.0 of the .NET Framework. One of the things that came up were issues with some management decisions that were made at the start of the project which were reinforced during the middle of the project by new management. What I found interesting was that the main management folks who had made these decisions weren't part of the post mortem because they'd been reorged away. Our team has had 3 general managers in the 2.5 years I've been working here and we've been working on Whidbey for almost the entire time. What struck me is that it means that these folks who are now off in other parts of the B0rg cube haven't had to see the pros and cons of the various decisions they made unfold over time or what the ramifications of various trade offs and risky bets they made have been. The entire point of gaining experience is to learn from it. Frequent reorganizations prevent this learning process from occuring. 

Thanks, Dilbert.  


 

Categories: Life in the B0rg Cube

At Microsoft one of our goals in developing software is that backwards compatibility when moving from one version of software to the next is high priority. However in certain cases the old behavior may be undesirable enough that we break compatibility. An example of such undesirable behavior are bugs that lead to incorrect results or security issues. Below is a list of breaking changes in the System.Xml namespace in beta 1 of v2.0 of the .NET Framework.

  1. Extension of xs:anyType which changes the content type to mixed="false" results in error.
  2. Add an enumeration member to XmlWriter.WriteState to indicate that the writer is in error state. Change writer to disallow further writes when in error state.
  3. ##other namespace constraint now treated correctly on wild cards.
  4. Instances of DateTime object returned by XmlValidatingReader that represent xs:time and other date & time related W3C XML Schema types now initialized using DateTime.MinValue
  5. Incorrect implementation of XSD derivation hierarchy for xs:ENTITY and xs:NCName corrected.
  6. XSD List Types Not Validated Correctly.
  7. Changed to reliably fail when XmlTextReader source stream switches encoding between calls to ResetState()
  8. XmlTextReader should apply the same security restrictions as the XmlReaders that can be created via the static XmlReader.Create() methods

I was directly involved in the decision making process for most of these breaking changes since many are in the W3C XML Schema area which I am responsible for. If any further clarifications about any of the breaking changes is needed, please post a comment with your question below.


 

Categories: Life in the B0rg Cube | XML

July 4, 2004
@ 04:24 AM

I just noticed that the most recent version of RSS Bandit has had over 20,000 downloads in just over a month. It seems like every day I find a new blog post from someone who's switched to RSS Bandit or just started using it as their first aggregator describing their improved user experience compared to other aggregators. Thanks for the support, there is even better stuff on the way.

I've been busy with work so in the meantime Torsten and Phil have been working on bug fixes for some of our most pressing user requests. In between releases we try to produce stable builds which early adopters can test to see if certain persistent bugs have been fixed. These RSS Bandit daily builds do not have an installer but can be downloaded and run directly by double-clicking on the RssBandit.exe icon after unzipping the folder. For example, the 6/15/2004 build fixes an issue with downloading feeds from behind a firewall. It should be noted that these interim builds are not expected to be release quality and have not been tested as rigorously as a full release that ships with an installer. However if you are interested in keeping pace with RSS Bandit development and providing feedback in making it an even better aggregator then keeping up with our daily builds is one way to do that. 

My workload at my day job has eased, so in the next few weeks I have time to work with Torsten on fixing a number of our reported bugs as well as prioritizing and implementing various feature requests. One thing I have noticed is that a number of people would like to get insight into our plans for the next release. The first cut at this was my prioritized Top 10 list of features for the next version of RSS Bandit. However that list doesn't take into account a number of smaller feature items we'd like to do nor does it give much detail about what we will do. Both Torsten and Phil would like to see proper specifications for the next release (requirements document, design docs, etc) which I'd gladly write since writing specs is something I was doing for fun before it became my day job. However this is work that would take away from coding time and since there'd only be one or two other readers of the document(s) I'm unsure as to whether just adding more detail to our existing communication practices isn't a better bet for the long run.

Whatever is decided, we will be blogging about features as they are being implemented and providing builds which showcase the new features so we can get feedback from users.  The only question is how detailed we will be about discussing features before they actually show up as downloadable bits.


 

Categories: RSS Bandit

July 4, 2004
@ 01:20 AM

After being confused by a number of blog posts I read over the past few hours it just hit me that both Sun and Apple are using the code name Tiger for the next version of their flagship software product. That clears up a bunch of confusion.


 

Categories: Technology

July 3, 2004
@ 06:49 PM

I just got back from vacation. A week on the beach, sans laptop, sipping mai tais is good for the soul.

Whenever I travel by air I try to use the flight time to catch up on reading popular fiction. This time around I planned to do something different and finish reading Michael Brundage's XQuery: The Xml Query Language but forgot it in my mad dash to the airport. I decided to fallback on a tradition I started a few years ago and searched for a book by Terry Pratchett at one of the airport bookstores. In the past year or two I have noticed that I have been unable to find books by Terry Pratchett in airport bookstores in the United States although I did buy some of his books at Heathrow airport last year. At first, I thought it was because he hadn't published anything new recently but I noticed books from authors that are much longer in the tooth like Jeffrey Archer, Jackie Collins, Robert Ludlum, Sidney Sheldon, Mario Puzo, Danielle Steele and Anne Rice. The conclusion I can draw is that there is some Clear Channel-like company that owns a majority of the bookstores located in airports in the United States which has placed an embargo on the works of Terry Pratchett. I ended up settling for the turgid prose of Anne Rice's Memnoch the Devil and Scott Adams' excellent The Dilbert Principle.

Memnoch the Devil was disappointing. I'd enjoyed the previous books in the series (Interview with the Vampire, The Vampire Lestat and Queen of the Damned) although I did find the subsequent book in the series, The Vampire Armand, quite dreadful and threw it away without finishing it. The book was fairly unimaginative [especially compared to what authors like Neil Gaiman have done with similar themes], predictable and most annoyingly inconsistent with the very religious works it was supposed to be based on.

The Dilbert Principle was very entertaining and had me introspective about work. I definitely feel there's a lot in the book that rings through about Microsoft as is probably true with any large company. I did find some of his ideas on how to create an enjoyable and challenging workplace spot on although I doubt they'll ever penetrate the consciousness of Corporate America.

*sigh*

I now need to catch up on email. Over 500 messages in my Yahoo! inbox (over 450 from the atom-syntax mailing list) and about 700 in my work inbox. Then there's the 1000 unread blog entries in RSS Bandit. Welcome to information overload...


 

Categories: Ramblings