February 9, 2005
@ 03:05 PM

David Megginson (the creator of SAX) has a post entitled The complexity of XML parsing APIs where he writes

Dare Obasanjo recently posted a message to the xml-dev mailing list as part of the ancient and venerable binary XML permathread (just a bit down the list from attributes vs. elements, DOM vs. SAX, and why use CDATA?). His message including the following:

I don’t understand this obsession with SAX and DOM. As APIs go they both suck[0,1]. Why would anyone come up with a simplified binary format then decide to cruft it up by layering a crufty XML API on it is beyond me.

[0] http://www.megginson.com/blogs/quoderat/archives/2005/01/31/sax-the-bad-the-good-and-the-controversial/

[1] http://www.artima.com/intv/dom.html

I supposed that I should rush to SAX’s defense. I can at least point to my related posting about SAX’s good points, but to be fair, I have to admit that Dare is absolutely right – building complex applications that use SAX and DOM is very difficult and usually results in messy, hard-to-maintain code.

I think this is a pivotal part of the binary XML debate. The primary argument for binary serializations of XML is that certain parties want to get the benefit of the wide array of technologies for processing XML yet retain the benefits of a binary format such as reduced size on the wire and processing time. Basically having one's cake and eating it too.

For me, the problem is that XML is already being pulled from too many directions as it is. In retrospect I realize it was foolish for me to think that the XML team could come up with a single API that would satisfy people processing business documents written in  wordprocessingML, people building distributed computing applications using SOAP or developers reading & writing to application configuration files. All of these scenarios use intersecting subsets of the full functionality of the XML specification. The SOAP specs go as far as banning some features of XML while others are simply frowned upon based on the fact that the average SOAP toolkit simply doesn't know what to do with them. One man's meat (e.g. mixed content) is another man's poison.

What has ended up happening is that we have all these XML APIs that expose a lot of cruft of XML that most developers don't need or even worse make things difficult in the common scenarios because they want to support all the functionality of XML. This is the major failing of APIs such as the .NET Framework's pull model parser class, System.Xml.XmlReader, DOM and SAX. The DOM also has issues with the fact that it tries to support conflicting data models (DOM vs. XPath) and serialization formats (XML 1.0 & XML 1.0 + XML namespaces). At the other extreme we have APIs that try to simplify XML by only supporting specific subsets of its expressivity such as the System.Data.DataSet and the System.Xml.XmlSerializer classes in the .NET Framework. The problem with these APIs is that the developer is dropped of a cliff once they reach the limits of the XML support of the API and have to either use a different API or resort to gross hacks to get what they need done. 

Unfortunately one of the problems we had to deal with when I was on the XML team was that we already had too many XML APIs as it was. Introducing more would create developer confusion but trying to change the existing ones would break backwards compatibility. Personally I'd rather see efforts being to create better toolkits and APIs for the various factions that use XML to make it easier for them to get work done than constantly churning the underlying format thus fragmenting it.


 

Categories: XML

I got an email from Shelly Farnham announcing a Social Computing Symposium, sponsored by Microsoft Research which will be held at Redmond Town Center on April 25-26. Below is an excerpt from the announcement

In the spring of 2004 a small two-day Social Computing Symposium, sponsored by Microsoft Research, brought together researchers, builders of social software systems, and influential commentators on the social technology scene...A second symposium at Microsoft Research is planned for April 25th-26th, 2005...

If you are interested in attending the symposium, please send a brief, 300-500 word position paper. The symposium is limited to 75 people, and participants will be selected on the basis of submitted position papers.

Position papers should not be narrowly focused on either academic study or industry practice. Rather, submissions should do one or more of the following: address theoretical and methodological issues in the design and development of social computing technologies; reflect on concrete experiences with mobile and online settings; offer glimpses of novel systems; discuss current and evolving practices; offer views as to where research is needed.

We are particularly interested in position papers that explore any of the following areas. However, given the symposium’s focus on new innovation in social technologies, we are open to other topics as well.

a) The digitization of identity and social networks.

b) Proliferation and use of social metadata.

c) Mobile, ubiquitous social technologies changing the way we socialize.

d) Micropublishing of personal content (e.g. blogs), and the democratization of information exchange and knowledge development.

e) Social software on the global scale: the impact of cross-cultural differences in experiences of identity and community.

Please send your symposium applications to scspaper@microsoft.com by February 28th.

I would like to attend which means I have to cough up a position paper. I have 3 ideas for position papers; (i) Harnessing Latent Social Networks: Building a Better Blogroll with XFN and FOAF, (ii) Blurring the Edges of Online Communication Forms by Integrating Email, Blogging and Instant Messaging or (iii) Can Folksonomies Scale to Meet the Challenges of the Global Web?

So far I've shared this ideas with one person and he thought the first idea was the best. I assume some of the readers of my blog will be at this symposium. What would you guys like to get a presentation or panel discussion on?


 

February 9, 2005
@ 01:43 PM

The team I work for at MSN has open developer and test positions. Our team is responsible for the underlying infrastructure of services like MSN Messenger, MSN Spaces, MSN Groups and some of Hotmail. We frequentlu collaborate with other properties across MSN such as MyMSN and MSN Search as well as with other product units across Microsoft such as in the case of Outlook Live. If you are interested in building world class software that is used by hundreds of millions of people and the following job descriptions interest you then send me your resume

Software Design Engineer (Developer)

The team builds key platforms that facilitate storing and sharing photo albums, files, blogs, calendars, buddy lists, favorites, etc. We are looking for a strong software developer with good problem solving skills and at least 2-3 years of experience with C++ or other programming languages. Experience in relational databases especially data modeling, schema design, developing stored procedures and programming against the databases is required. Candidates with good SQL knowledge and performance tuning databases will be preferred.

Software Design Engineer/Test (Tester)

The MSN Communication and Platform Services team is looking for a passionate and experienced SDET candidate with a strong programming, design, and server background to help us test next generation internet sharing and communication scenarios at MSN. You will be responsible for designing and creating the test automation and tools for the MSN Contacts service, which stores over 9 billion live MSN Hotmail and MSN Messenger contacts for over 400 million users. You should be able to demonstrate thorough understanding of testing methodologies and processes. Requirements for this position include strong skills in C++ or C#, design, problem solving, troubleshooting, proven experience with test case generation techniques or model based testing methodologies. communication (written & verbal). and a strong passion for quality. Your position will include key tasks such as writing test plans and test cases, working with PM and Development to provide them with integration and architecture feedback, working across teams with major partners such as Hotmail, Office, and Windows, and driving quality and performance related issues to resolution.

Email your resume to dareo@msft.com (replace msft with microsoft) if the above job descriptions sound like they are a good fit for you. If you have any questions about what working here is like, you can send me an email and I'll either follow up via email or my blog to answer any questions of general interest [within reason].


 

Categories: Life in the B0rg Cube | MSN

February 9, 2005
@ 12:27 PM

Abbie, who is one of the coolest MSN Spaces users around, has a posted a collection of links to various posts showing how to get extra mileage out of MSN Spaces. Check out her post MSN Spaces Tips, Tricks, Gods and More . Some of my favorite links from her page include

Alerts For Your Space - want to set up alerts are learn how they work? Read here!

Edit It! Button - Scott's trick for obtaining additional blog editing features.

Guide to Trackbacks - What are trackbacks and how do you use them?

Edit! Help for FireFox Users - Some editing perks for your FireFox users.

Understanding Layout Customization - Learn your way around customizing your Space.

Minimizing Content Spam - Great post by Mike regarding spam in your Space.

Podcasting Your Space - Great information on how to set up your own podcast of your Space

Deleting Your Space - What really happens when you delete your Space?

Take Too Long and Lose It - Did you know you could lose your post if you take too long to type it out?  Read and learn how to prevent it.

Add A GuestBook - a unique way to add a guestbook to your Space

Give Your FeedBack and Ideas for Spaces - Have an idea for Spaces? Get it heard here!

Who Owns Your Spaces Content - a small but great FYI post regarding your content

There is at least one neat trick that Abbie missed from her list. Jim Horne shows how to embed audio and video into a blog post on a Space. Excellent.


 

Categories: Mindless Link Propagation | MSN

February 7, 2005
@ 02:24 PM
If you're black Iraqi, you gotta look at America a little bit different. You gotta look at America like the uncle who paid for you to go to college ... but molested you."

The above sentence, slightly modified from a quote in Chris Rock's Never Scared, sums up my thoughts on the elections in Iraq.


 

Categories: Ramblings

We are now feature complete for the next release of RSS Bandit. Interested users can download it at RssBandit.1.3.0.18.Wolverine.Beta.zip. All bugs specific to the previous alpha release have been fixed. The next steps for us are finish working on documentation, translations and fixing any bugs that are reported during the beta with the intention of shipping a final release in the next two weeks.

Check out a screenshot of the Wolverine Beta which shows some of the new features such as newspaper styles and the additional columns in the listview. The major new features are listed below but there are couple of minor ones such as proxy exclusion lists and better handling of SSL certificate issues which aren't listed. There will be a comprehensive list of bug fixes and new features in the announcement for the final release.

NEW FEATURES (compared to v1.2.0.117)

  • Newspaper styles: Ability to view all unread posts in feeds and categories or all posts in a search folder in a Newspaper view. This view uses templates that are in the same format as those used by FeedDemon so one can use RSS Bandit newspaper styles in FeedDemon and vice versa. One can choose to specify a particular stylesheet for a given feed or category. For example, one could use the slashdot.fdxsl stylesheet for viewing Slashdot, headlines.fdxsl for viewing news sites in the News category and outlook-express.fdxsl for viewing blog posts.  

  • Column Chooser: Ability to specify which columns are shown in the list view from a choice of Headline, Topic, Date, Author, Comment Count, Enclosures, Feed Title and Flag. One can choose to specify a column layout for a particular feed or category. For example, one could use the layout {Headline, Author, Topic, Date, Comment Count} for a site like Slashdot but use one like {Headline, Topic, Date} for a traditional blog that doesn't have comments.

  • Category Properties: It is now possible to specify certain properties for all feeds within a category such as how often they should be updated or how long posts should be kept before being removed. 

  • Identities: One can create multiple personas with associated information (homepage, name, email address, signature, etc) for use when posting comments to weblogs that support the CommentAPI.

  • del.icio.us Integration: Users who have accounts on the del.icio.us service can upload links to items of interest directly from RSS Bandit.  

  • Skim Mode: Added option to 'Mark All Items As Read on Exiting a Feed' 

  • Search Folder Improvements: Made the following additions to the context menu for search folders; 'New Search Folder', 'Refresh Search', 'Mark All Items As Read' and 'Rename Search Folder'. Also deletion of a search folder now prompts the user to prevent accidental deletion

  • Item Deletion: News items can be deleted by either using the [Del] key or using the context menu. Deleted items go to the "Deleted Items" special folder and can be deleted permanently by emptying the folder or restored to the original feed at a later date.

  • UI Improvements: Tabbed browsers now use a multicolored border reminiscent of Microsoft OneNote.

POSTPONED FEATURES (will be in the NightCrawler release)

  • NNTP Support
  • Synchronizing state should happen automatically on startup/shutdown
  • Applying search filters to the list view 
  • Provide a way to export feeds from a category

 

Categories: RSS Bandit

When I used to work on the XML team at Microsoft, there were a number of people who I interacted with who were so smart I used to feel like I learned something new everytime I walked into their office. These folks include

  • Michael Brundage - social software geek before it was a hip buzzword, XML query engine expert and now working on the next generation of XBox

  • Joshua Allen - semantic web and RDF enthusiast, if not for him I'd dismiss the semantic web as a pipe dream evangelized by a bunch of theoreticians who wouldn't know real code if it jumped them in the parking lot and defecated on their shoes, now works at MSN but not on anything related to what I work on

  • Erik Meijer - programing language god and leading mind behind X# Xen Cω , he is a co-inventor on all my patent applications most of which started off with me coming into his office to geek out about something I was going to blog about

  • Derek Denny-Brown - XML expert from back when Tim Bray and co. were still trying to figure out what to name it, one heckuva cool cat

Anyway that was a bit of digression before posting the link mentioned in the title of the post. Michael Brundage has an essay entitled Working at Microsoft  where he provides some of his opinions on the good, the bad, and the in-between of working at Microsoft. One key insight is that Microsoft tends to have good upper management and poor middle management. This insight strikes very close to home but I know better than to give examples of the latter in a public blog post. Rest assured it is very true and the effects on the company have cost it millions, if not billions of dollars.


 

Categories: Life in the B0rg Cube

One of the biggest assumptions I had about software development was shattered when I started working on the XML team at Microsoft. This assumption was that standards bodies know what they are doing and produce specifications that are indisputable. However I've come to realize that the problems of design by committee affects illustrious names such as the W3C and IETF just like everyone else. These problems become even more pernicious when trying to combine technologies defined in multiple specifications to produce a coherent end to end application.

An example of the problem caused by contradictions in core specifications of the World Wide Web is summarized in Mark Pilgrim's article, XML on the Web Has Failed. The issue raised in his article is that determining the encoding to use when processing an XML document retrieved off the Web via HTTP, such as an RSS feed, is defined in at least three specifications which contradict each other somewhat; XML 1.0, HTTP 1.0/1.1 and RFC 3023. The bottom line being that most XML processors including those produced by Microsoft ignore one or more of these specifications. In fact, if applications suddenly started following all these specifications to the letter a large number of the XML documents on the Web would be considered invalid. In Mark Pilgrim's article, 40% of 5,000 RSS feeds chosen at random would be considered invalid even though they'd work in almost all RSS aggregators and be considered well-formed by most XML parsers including the System.Xml.XmlTextReader class in the .NET Framework and MSXML.

The newest example, of XML specifications that should work together but instead become a case of putting square pegs in round holes is Daniel Cazzulino's article, W3C XML Schema and XInclude: impossible to use together??? which points out

The problem stems from the fact that XInclude (as per the spec) adds the xml:base attribute to included elements to signal their origin, and the same can potentially happen with xml:lang. Now, the W3C XML Schema spec says:

3.4.4 Complex Type Definition Validation Rules

Validation Rule: Element Locally Valid (Complex Type)
...

3 For each attribute information item in the element information item's [attributes] excepting those whose [namespace name] is identical to http://www.w3.org/2001/XMLSchema-instance and whose [local name] is one of type, nil, schemaLocation or noNamespaceSchemaLocation, the appropriate case among the following must be true:

And then goes on to detailing that everything else needs to be declared explicitly in your schema, including xml:lang and xml:base, therefore :S:S:S.

So, either you modify all your schemas to that each and every element includes those attributes (either by inheriting from a base type or using an attribute group reference), or you validation is bound to fail if someone decides to include something. Note that even if you could modify all your schemas, sometimes it means you will also have to modify the semantics of it, as a simple-typed element which you may have (with the type inheriting from xs:string for example), now has to become a complex type with simple content model only to accomodate the attributes. Ouch!!! And what's worse, if you're generating your API from the schema using tools such as xsd.exe or the much better XsdCodeGen custom tool, the new API will look very different, and you may have to make substancial changes to your application code.

This is an important issue that should be solved in .NET v2, or XInclude will be condemned to poor adoption in .NET. I don't know how other platforms will solve the W3C inconsistency, but I've logged this as a bug and I'm proposing that a property is added to the XmlReaderSettings class to specify that XML Core attributes should be ignored for validation, such as XmlReaderSettings.IgnoreXmlCoreAttributes = true. Note that there are a lot of Ignore* properties already so it would be quite natural.

I believe this is a significant bug in W3C XML Schema that it requires schema authors to declare up front in their schemas where xml:lang, xml:base or xml:base may occur in their documents. Since I used to be the program manager for XML Schema technologies in the .NET Framework this issue would have fallen on my plate. I spoke to Dave Remy who toke over my old responsibilities and he's posted his opinion about the issue in his post XML Attributes and XML Schema .  Based on the discussion in the comments to his post it seems the members of my old team are torn on whether to go with a flag or try to push an errata through the W3C. My opinion is that they should do both. Pushing an errata through the W3C is a time consuming process and in the meantime using XInclude in combination with XML Schema is signficantly crippled on the .NET Framework (or on any other platform that supports both technologies). Sometimes you have to do the right thing for customers instead of being ruled by the letter of standards organizations especially when it is clear they have made a mistake.

Please vote for this bug on the MSDN Product Feedback Center


 

Categories: XML

So I just read an interesting post about Technorati Tags on Shelley Powers's blog entitled Cheap Eats at the Semantic Web Café. As I read Shelley's post I kept feeling a strong sense of deja vu which I couldn't shake. If you were using the Web in the 1990s then the following descriptions of Technorati Tags taken from their homepage should be quite familiar.

What's a tag?

Think of a tag as a simple category name. People can categorize their posts, photos, and links with any tag that makes sense.

....

The rest of the Technorati Tag pages is made up of blog posts. And those come from you! Anyone with a blog can contribute to Technorati Tag pages. There are two ways to contribute:

  • If your blog software supports categories and RSS/Atom (like Movable Type, WordPress, TypePad, Blogware, Radio), just use the included category system and make sure you're publishing RSS/Atom and we'll automatically include your posts! Your categories will be read as tags.
  • If your blog software does not support those things, don't worry, you can still play. To add your post to a Technorati Tag page, all you have to do is "tag" your post by including a special link. Like so:
    <a href="http://technorati.com/tag/[tagname]" rel="tag">[tagname]</a>
    The [tagname] can be anything, but it should be descriptive. Please only use tags that are relevant to the post. No need to include the brackets. Just make sure to include rel="tag".

    Also, you don't have to link to Technorati. You can link to any URL that ends in something tag-like. These tag links would also be included on our Tag pages:
    <a href="http://apple.com/ipod" rel="tag">iPod</a>
    <a href="http://en.wikipedia.org/wiki/Gravity" rel="tag">Gravity</a>
    <a href="http://flickr.com/photos/tags/chihuahua" rel="tag">Chihuahua</a>

If you weren't using the Web in the 1990s this may seem new and wonderful to you but the fact is we've all seen this before. The so-called Technorati Tags are glorified HTML META tags with all their attendant problems. The reason all the arguments in Shelley's blog post seemed so familiar is that a number of them are the same ones Cory Doctorow made in his article Metacrap from so many years ago. All the problems with META tags are still valid today most important being the fact that people lie especially spammers and that even well intentioned people tend to categorize things incorrectly or according to their prejudices.

META tags simply couldn't scale to match the needs of the World Wide Web and are mostly ignored by search engines today. I wonder why people think that if you dress up an old idea with new buzzwords (*cough* folksonomies *cough* social tagging *cough*) that it somehow turns a bad old idea into a good new one?


 

Categories: Technology

I noticed an eWeek article this morning titled Microsoft Won't Bundle Desktop Search with Windows which has had me scratching my head all morning. The article contains the following excerpts

Microsoft Corp. has no immediate plans to integrate desktop search into its operating system, a company executive said at a conference here this weekend.
...
Indeed, while including desktop search in Windows might seem like a logical step to many, "there's no immediate plan to do that as far as I know," Kroese said. "That would have to be a Bill G. [Microsoft chairman and chief software architect Bill Gates] and the lawyers' decision."

I thought Windows already had desktop search. In fact, Jon Udell of  Infoworld recently provided a screencast in his blog post Where was desktop search when we needed it? which shows off the capabilities of the built-in Windows desktop search which seems almost on par with the recent offerings from MSN, Google and the like.

Now I'm left wondering what the EWeek article means. Does it mean there aren't any plans to replace the annoying animated dog  with the MSN butterfly? That Microsoft has access to a time machine and will go back and rip out desktop search from the operating system including the annoying animated dog? Or is there some other obvious conclusion that can be drawn from the facts and the article that I have failed to grasp?

The technology press really disappoints me sometimes.


 

Categories: Technology