Below is the list of the developers who got the Microsoft Most Valuable Professional (MVP) Award for the 2004-2005 calendar year in the XML category

These developers have all been outstanding members of Microsoft's peer-to-peer communities.


 

Categories: Life in the B0rg Cube | XML

January 27, 2004
@ 04:33 PM

As Joshua Allen mentioned in his blog our team recently had a WinFX Review. This is basically a design review with a number of top architects from across the .NET Framework to ensure that the API you are building is consistent with the design guidelines for an API that will be shipping in the next version of Windows (i.e. LongHorn). We got a lot of good feedback which we are in the process of responding to and has caused a few design changes. The good news is that we've come up with a story for XPathNavigator2, XPathEditor  and XPathDocument2 that most people who've heard are happy with.

After the review we were pinged by Anders Hejlsberg who missed the original design review and asked if we could do a mini-review with just him. He gave lots of good feedback, questioned some of our scenarios and was quite amiable. I think he was mostly satisfied with the design decisions we'd made but thought we could do more in making processing XML dead easy as opposed the current situation where the developer needs to know a bit too much about XML and our programming model.  He also talked about the tradeoffs of going to a cursor based model (XPathNavigator2/XPathEditor) from a tree based model (XmlNode) and the disconnects developers may feel once they make the shift. I suspect it will be similar to the disconnect developers initially felt when moving from MSXML & Java which had a push-based model (SAX) for processing XML in a streaming fashion to the .NET Framework which uses a pull-based model (XmlReader). At first it was unfamiliar but once they started using it and saw the benefits they preferred it to the old way.

That said we do need to think some more about how to better benefit the “XML as config file format AKA CSV on steroids” demographic. A large number of developers just see XML as nothing more than a format for configuration and log files in which case a lot of the cruft of XML is meaningless to them such as entities, processing instructions and CDATA sections. 


 

Categories: Life in the B0rg Cube

January 25, 2004
@ 05:47 PM

From Jamie Zawinski we learn

 Student Caught In Racial Controversy:

OMAHA, Neb. -- Four Westside High School students are suspended for promoting a white student for an African-American award. Flyers featured junior Trevor Richards, a South African native who moved to the United States in 1997.

Trevor said he is as African as anyone else.

This is just ridiculous. The kid is obviously African-American if the words are expected to be taken literally. Of course, like most euphemisms the words aren't meant to be taken literally but instead are supposed to map to a [sometimes unrelated] concept. I hope the kids take the school district to court.

Idiots.


 

I just registered into Orkut (Google's version of Friendster according to Slashdot), thanks to an invitation from Don Park -- thanks Don. Part of the registration process contained one my pet peeves, a question about ethnicity that had [african american (black)] as one of the options. As if both terms are interchangeable. I almost picked [other] since the designers of the software seemed to think that people of African descent that aren't citizens of the United States weren't a large enough demographic to have their own option in the drop down list. I ended up going with [african american (black)] since I didn't want to confuse people who'd be looking up my profile.

Checking out a couple of people's friend networks it seems the misgivings I had about Friendster which kept me from using it when I first heard about it are accurate if Orkut is anything like it. Online folks seem to have a weird definition of friend. When I think friend, I think of someone you'd give a call and say "Hello, I just killed someone" and after a pause their response is "Shit, so what are we going to do about the body?" That isn't to be taken literally but you get the idea. A friend is someone you'd go to the ends of the Earth for and who'd do the same for you. People with whom my primary interaction involves reading their weblogs and exchanging mail on various mailing lists don't really fall into the "friend" category for me. Lumping those people together with folks I've known all my life who've been with me through thick and thin who've done things like let me hold their bank card with the PIN number to use in case of emergencies when I was broke, trusted me to come up with my share of the rent and bills when I had no job and no prospects because I gave my word, and helped me get out of trouble when I thought I was in over my head just seems wrong to me.  

There are acquaintences, friends and folks I'd die for. Lumping them all into one uber-category called friends just doesn't jibe with me. I'll play with the site some more later today but I doubt I'll be on it for long. I've got some stuff coming in from IKEA this morning.


 

Categories: Ramblings

January 24, 2004
@ 05:20 PM

Don't you just love temporal searches?


 

Categories: RSS Bandit

January 23, 2004
@ 05:05 PM

Theres been some recent surprise by blogcrazy about the recent democratic party caucus in Iowa in which John Kerry won 38 percent of the state convention delegates, with 32 percent for John Edwards, 18 percent for Howard Dean and 11 percent for Gephardt. Many had assumed that Howard Dean's highly successful Internet campaign with its adoption of blogging technologies and support by many bloggers were an indication of strong grass roots support. Yeah, right.

Robert Scoble wrote

Along these lines, by tomorrow, I'm sure there'll be more than one person that will gnash their teeth and write "weblogs failed Dean."

Well, the weblog hype did get overboard the past few weeks. Weblogs do matter. Why? The influentials read weblogs. The press. The insiders. The passionate ones.

But, the average Joe doesn't read these. Come on, be real. Instapundit gets, what, 100,000 to 200,000 visitors a day? I get 2,000. That's a small little dinky number in a country of 290 million.

Weblogs and online technologies have helped Dean and others collect a lot of money, but you still gotta have a TV persona that hits home. Just reality in 2004. I'm not bitter about that.

The lessons for big-company evangelism (or small company, for that matter) are the same. If your product isn't something that average people like, it doesn't matter how good the weblogs are.

Considering that Robert Scoble is one of the weblog hypesters who may have gone “overboard“ as he puts  it I find his post particularly telling. Folks like Robert Scoble have trumpetted that weblogs would be triumphant against traditional marketting and in many posts he's berated product teams at the company he works for [Microsoft] that don't consider weblogging as part of their marketing message. Weblogs are currently a fairly low cost way of communicating with a certain class of internet savvy people. However nothing beats traditional communication channels such as television, billboards and the print media for spreading a message amongst all and sundry.

Don't be blinded by the hype.

There's one other response to the recent events in Iowa that made me smile.  Doc Searles wrote 


I see that my positive spin yesterday on Howard Dean's "barbaric yawp" speech got approximately no traction at all. Worse, the speech was (predictably) mocked by everybody in the major media from Stern in the morning to Letterman and Leno in the evening. 

  Clearly, its effects were regretable. It hurt the campaign. But it was also honest and authentic, and in the long run that can only help, for the simple reason that it was real.
  So. What to do? 
  Here's my suggestion... Look at media coverage as nothing more than transient conditions, like weather. And navigate by the stars of your own constituency.
  The main lesson from Cluetrain is "smart markets get smarter faster than most companies." The same goes for constituencies and candidates. Your best advice will come from the people who know you best, who hear your voice, who understand the missions of your campaign and write about it clearly, thoughtfully and with great insight. They're out there. Your staff can help you find them. Navigate by their stars, not the ones on television. 

I've always found people who espouse the Cluetrain Manifesto as seeming particularly naive as to the realities of markets and marketing. Telling someone to ignore media coverage and keep it real is not how elections are won in America. Any student of recent American history knows the increased significance of the media in presedential elections ever since the televised Kennedy-Nixon debates in the elections of the1960s.

Those who do not learn from history are doomed to repeat it.


 

Categories: Ramblings

Being a hobbyist developer interested in syndication technologies I'm always on the look out for articles that provide useful critiques of the current state of the art. I recently stumbled on an article entitled 10 reasons why RSS is not ready for prime time by Dylan Greene which fails to hit the mark in this regard. Of the ten complaints, about three seem like real criticisms grounded in fact while the others seem like contrived issues which ignore reality. Below is the litany of issues the author brings up and my comments on each

1) RSS feeds do not have a history. This means that when you request the data from an RSS feed, you always get the newest 10 or 20 entries. If you go on vacation for a week and your computer is not constantly requesting your RSS feeds, when you get back you will only download the newest 10 or 20 entries. This means that even if more entires were added than that while you were gone, you will never see them.

Practically every information medium has this problem. When I travel out of town for a week or more I miss the newspaper, my favorite TV news shows and talk radio shows which once gone I'll likely never get to enjoy. With blogs it is different since most blogs provide an online archive but on the other hand most news sites archive their content and require paid access after they're no longer current.

In general, most individual blogs aren't updated regularly enough that being gone for a week or two means that entries are missed. On the other hand most news sites are. In such cases one could leave their aggregator of choice connected and reduce its refresh rate (something like once a day) and let your fingers do the walking. That's exactly what one would have to do with a TiVo (i.e. leave your cable box on).

2) RSS wastes bandwidth. When you "subscribe" to an RSS feed, you are telling your RSS reader to automatically download the RSS file on a set interval to check for changes. Lets say it checks for news every hour, which is typical. Even if just one item is changed the RSS reader must still download the entire file with all of the entries.

The existing Web architecture provides a couple of ways for polling based applications to save bandwidth including  HTTP conditional GET and gzip compression over HTTP. Very few web sites actually support both well-known bandwidth saving techniques including the Dylan Green based on a quick check with Rex Swain's HTTP Viewer. Using both techniques can save bandwidth costs by an order of magnitude (by a factor of 10 for the mathematically challenged). Before coming up with sophisticated hacks for perceived problems it'd be nice if website administrators actually used existing best practices before trying to reinvent the wheel in more complex ways.

That said, it would be a nice additional optimization for web sites to only provide only the items that hadn't been read by a particular client for each request for the RSS feed. However I'd like to see us learn to crawl before we try to walk.

3) Reading RSS requires too much work. Today, in 2004, we call it "browsing the Web" - not "viewing HTML files". That is because the format that Web pages happen to be in is not important. I can just type in "msn.com" and it works. RSS requires much more than that: We need to find the RSS feed location, which is always labeled differently, and then give that URL to my RSS reader.

Yup, there isn't a standard way to find the feed for a website. RSS Bandit tries to make this easier by feed lookup via Syndic8 and supporting one click subscription to RSS feeds.  However aggregator authors can't do this alone, the blogging tools and major websites that use RSS need to get in on the act as well.

4) An RSS Reader must come with Windows. Until this happens too, RSS reading will only be for a certain class of computer users that are willing to try this new technology. The web became mainstream when Microsoft started including Internet Explorer with Windows. MP3's became mainstream when Windows Media Player added MP3 support.

I guess my memory is flawed but I always thought Netscape Navigator and Winamp/Napster where the applications that brought the Web and MP3s to the mainstream respectively. I'm always amused by folks that think that unless Microsoft supports some particular technology then it is going to fail. It'd be nice if an RSS aggregator but that doesn't mean that a technology cannot become popular until it ships in Windows. Being a big company, Microsoft is generally slow to react to trends until they've proven themselves in the market which means that if an aggregator ever ships in Windows it will happen when news aggregators are mainstream not before.

5) RSS content is not User-Friendly. It has taken about 10 years for the Web to get to the point where it is today that most web pages we visit render in our browser the way that the designer intended. It's also taken about that long for web designers to figure out how to lay out a web page such that most users will understand how to use it. RSS takes all of that usability work and throws it away. Most RSS feeds have no formatting, no images, no tables, no interactive elements, and nothing else that we have come to rely on for optimal content readability. Instead we are kicked back to the pre-web days of simple text.

I find it hard to connect tables, interactive elements and images with “optimal content readability” but maybe that's just me. Either way, there's nothing stoping folks from using HTML markup in RSS feeds. Most of the major aggregators are either browser based or embed a web browser so viewing HTML content is not a problem. Quite frankly, I like the fact that I don't have to deal with cluttered websites when reading content in my aggregator of choice.

6) RSS content is not machine-friendly. There are search engines that search RSS feeds but none of them are intelligent about the content they are searching because RSS doesn't describe the properties of the content well enough. For example, many bloggers quote other blogs in their blog. Search engines cannot tell the difference between new content and quoted content, so they'll show both in the search results.

I'm curious as to which search engine he's used which doesn't have this problem. Is there an “ignore items that are parts of a quote” option on Google or MSN Search? As search engines go I've found Feedster to be quite good and better than Google for a certain class of searches. It would be cool to be able to execute ad-hoc, structured queries against RSS feeds but this would be icing on the cake and in fact is much more likely to happen in the next few years than is possible that we will ever be able to perform such queries against [X]HTML web sites.

7) Many RSS Feeds show only an abridged version of the content. Many RSS feeds do not include the full text. Slashdot.org, one of the most popular geek news sites, has an RSS feed but they only put the first 30 words of each 100+ word entry in their feed. This means that RSS search engines do not see the full content. This also means that users who syndicate their feed only see the first few words and must click to open a web browser to read the full content.

This is annoying but understandable. Such sites are primarily using an RSS feed as a way to lure you to the site not as a way to provide users with content. I don't see this as a problem with RSS any more than the fact that some news sites need you to register or pay to access their content is a problem with HTML and the Web.

8) Comments are not integrated with RSS feeds. One of the best features of many blogs is the ability to reply to posts by posting comments. Many sites are noteworthy and popular because of their comments and not just the content of the blogs.

Looks like he is reading the wrong blogs and using the wrong aggregators. There are a number of ways to expose comments in RSS feeds and a number of aggregators support them including RSS Bandit which supports them all.

9) Multiple Versions of RSS cause more confusion. There's several different versions of RSS, such as RSS 0.9, RSS 1.0, RSS 2.0, and RSS 3.0, all controlled by different groups and all claiming to be the standard. RSS Readers must support all of these versions because many sites only support one of them. New features can be added to RSS 1.0 and 2.0 can by adding new XML namespaces, which means that anybody can add new features to RSS, but this does mean that any RSS Readers will support those new features.

I assume he has RSS 3.0 in there as a joke. Anyway, the existence of multiple versions of RSS is not that much more confusing to end users than the existence of multiple versions of [X]HTML, HTTP, Flash and Javascript some of which aren't all supported by every web browser.

That said a general plugin mechanism to deal with items from new namespaces would be an interesting problem to try and solve but sounds way too hard to successfully provide a general solution for.

10) RSS is Insecure. Lets say a site wants to charge for access to their RSS feed. RSS has no standard way for inputing a User Name and Password. Some RSS readers support HTTP Basic Authentication, but this is not a secure method because your password is sent as plain text. A few RSS readers support HTTPS, which is a start, but it is not good enough. Once somebody has access to the "secure" RSS file, that user can share the RSS file with anybody.

Two points. (A) RSS is a Web technology so the standard mechanisms for providing restricted yet secure access to content on the Web apply to RSS and (B) there is no way known to man short of magic to provide someone with digital content on a machine that they control and restrict them from copying it in some way, shape or form.

Aight, that's all folks. I'm off to watch TiVoed episodes of the Dave Chappelle show.


 

Categories: XML

January 21, 2004
@ 08:56 PM

Thanks to Technorati Beta, I found Aaron Skonnard's blog.  Aaron is the author of the XML Files column in the MSDN Magazine and an all around XML geek.


 

Categories: XML

Mark Pilgrim has a post entitled The history of draconian error handling in XML  where he excerpts a couple of the discussions on the draconian error handling rules of XML which state that if an XML processor encounters a syntax error in an XML document it should stop parsing and indicate a fatal error as opposed to muddling along or trying to fixup the error in some way. According to Tim Bray 

What happened was, we had a really big, really long, really passionate argument on the subject; the camps came to be called “Draconians” and “Tolerants.” After this had gone on for some weeks and some hundreds of emails, we took a vote and the Draconians won 7-4.

Reading some of the posts from 6 years ago on Mark Pilgrim's blog it is interesting to note that most of the arguments on the sides of the Tolerants are simply no longer relevant today while the Draconians turned out to be the reason for XML's current widespread success in the software marketplace.

The original goal of XML was to create a replacement for HTML which allowed you to create your own tags yet have them work in some fashion on the Web (i.e SGML on the Web). Time has shown that placing XML documents directly on the Web for human consumption just isn't that interesting to the general Web development comunity. Most content on the Web for human consumption is still HTML tag soup. Even when Web content claims to be XHTML it often is really HTML tag soup either because it isn't well-formed or is invalid according to the XHTML DTD. Even applications that represent data internally as XML tend to use XSLT to transform the content to HTML as opposed to putting the XML directly on the Web and styling it with CSS. As I've mentioned before the dream of the original XML working group of replacing HTML by inventing “SGML on the Web” is a failed dream. Looking back in hindsight it doesn't seem that the choice of tolerant over draconian error handling would have made a difference to the lack of adoption of XML as a format for representing content targetted for human consumption on the Web today.

On the other hand, XML has flourished as a general data interchange format for machine-to-machine interactions in wide ranging areas from distributed computing and database applications to being a format for describing configuration files and business documents. There are a number of reasons for XML's rise to popularity

  1. The ease with which XML technologies and APIs enabled developers to process documents and data in an easier and more  flexible manner than with previous formats and technologies.
  2. The ubiquity of XML implementations and the consistency of the behavior of implementations across platforms.
  3. The fact that XML documents were fairly human-readable and seemed familiar to Web developers since it was HTML-like markup.

Considering the above points, does it seem likely that XML would be as popular outside of its original [failed] design goal of being a replacement for HTML if the specification allowed parsers to pick and choose which parts of the spec to honor with regards to error recovery? Would XML Web Services be as useful for interoperability between platforms if different parser implementations could recover from syntax errors at will in a non-deterministic manner? Looking at some of the comments linked from Mark Pilgrim's blog it does seem to me that a lot of the arguments on the side of the Tolerants came from the perspective of “XML as an HTML replacement” and don't stand up under scrutiny in today's world.

April 19, 1997. Sean McGrath: Re: Error Handling in XML

Programming languages that barf on a syntax error do so because a partial executable image is a useless thing. A partial document is *not* a useless thing. One of the cool things about XML as a document format is that some of the content can be recovered even in the face of error. Compare this to our binary document friends where a blown byte can render the entire content inaccessible.

Given that today XML is used for building documents that are effectively programs such as XSLT, XAML and SVG it does seem like the same rules that apply for partial programs should apply as well.

May 7, 1997. Paul Prescod: Re: Final words, I think, on error handling

Browsers do not just need a well-formed XML document. They need a well-formed XML document with a stylesheet in a known location that is syntactically correct and *semantically correct* (actually applies reasonable styles to the elements so that the document can be read). They need valid hyperlinks to valid targets and pretty soon they may need some kind of valid SGML catalog. There is still so much room for a document author to screw up that well-formedness is a very minor step down the path.

I have to agree here with the spirit of the post [not the content since it assumed that XML was going to primarily be a browser based format]. It is far more likely and more serious that there are logic errors in an XML document than syntax errors. For example, there are more RSS feeds out there with dates are invalid based on the RSS spec they support than there are ill-formed feeds. And in a number of these it is a lot easier to fix the common well-formedness errors than it is to fix violations of the spec (HTML in descriptions or titles, incorrect date formats, data other than email addresses in the <author> element, etc).

May 7, 1997. Arjun Ray: Re: Final words, I think, on error handling

The basic point against the Draconian case is that a single (monolithic?) policy towards error handling is a recipe for failure. ...

XML is many things but I doubt that one could call it a failure except when it comes to its original [flawed] intent of replacing HTML. As an mechanism for describing structured and semi-structured content in a robust, platform independent manner IT IS KING.

So why do I say everyone lost yet everyone won? Today most XML on the Web targetted at human consumption [i.e. XHTML] isn't well-formed so in this case the Tolerants were right and the Draconians lost since well-formed XML has been a failure on the human Web. However in the places were XML is getting the most traction today, the draconian error handling rules promote interoperability and predictability which is the opposite of what a number of the Tolerants expected would happen with XML in the wild.  


 

Categories: XML

By default RSS Bandit isn't registered as the handler for the "feed" URI scheme until this feature is enabled. Once this feature is enabled, clicking on URIs such as feed:http://www.25hoursaday.com/weblog/SyndicationService.asmx/GetRss in your browser of choice should launch RSS Bandit's new feed dialog. The screenshot below shows how to do this from the Options dialog.

Future releases of RSS Bandit will have this enabled by default.


 

Categories: RSS Bandit