February 26, 2004
@ 05:54 PM

Aaron Swartz has lots of interesting ideas about politics and copyright in the age of digital media. I disagree with a lot of his ideas on both but they are often well-thought and interesting. This month he continues his trend of interesting posts about politics with two entries Up is Down: How Stating the False Hides the True excerpted below

One of the more interesting Republican strategies is saying things whose opposite is true. They say that the Democratic nominee is bought off by special interests, the Democrats are outspending them, the Democrats are playing dirty, the Democrats don’t care about homeland security, the Democrats hate America, all when this is far more true of the Republicans. They say Joseph McCarthy was a noble man, the media has a liberal bias, affirmative action is bad for equality, Saddam had weapons of mass destruction, and Ronald Reagan was our greatest President, all when the opposite is far more true.

At first glance this seems bizarre — why draw attention to your weaknesses? But it’s actually a very clever use of the media. The media tries hard to be “fair and balanced”, and it generally believes the best way to do this is to present the opinions from both sides and make as few judgement calls as possible (to avoid introducing their own bias). And if there’s a debate on some issue, taking a side is seen as a judgement call.

and Down is Up: What This Stuff Is where he writes

I got a lot of responses to my previous post, Up is Down, along the lines of “oh, the Democrats lie as much as the Republicans”. But the piece was not about lies. For lack of a better term, it was about anti-truths. Anti-truths have two parts:

  1. They’re completely false.
  2. They’re more accurate when directly reversed.

It’s hard to find a completely unobjectionable one, but take “Ronald Reagan was our greatest President.” As for part one, I have seen no evidence that Reagan actually did anything particularly good on purpose and as for two, “Ronald Reagan was our worst President” seems to be a far more accurate statement, since he did lots of things that were quite bad.

My example of an anti-truth would have been “John Ashcroft respects the US constitution”. :)


 

February 26, 2004
@ 05:45 PM

In his post WinFS Scenario #2: event planning Jeremy Mazner writes

So as you can see, the information for any given event is spread all over the place, which means I can’t really keep track of it all.  If I want to see the invite list and RSVP status, I’m either in Outlook (if I’m lucky) or off to some file share to find the spreadsheet.  If I want to see some information about an invitee (their phone number, or who their account manager is), it’s off to the directory or CRM system to look it up.  If I want to know what presentations have been approved for use, I crawl through email to find the one message from my general manager where he says the presentation is ready.  If I want to see the actual presentation, it’s back to a file share or Sharepoint

What I really want is a way to corral all these related items together, stick them in a big bucket with a label that says what event they’re for.  I want a simple UI where I can see what events are coming up, then browse through all the related material for each one, and maybe be able to answer some simple questions: how many presentations for this event are in the Approved state?  How many attendees have declined the invitation?

 

I’ll assert that this is really, really hard to do today.  Outlook wizards would probably argue that you could do this with some series of catagories, public folders, and shared calendars.  SharePoint gurus would say this is exactly what a Meeting Workspace is for.  Old school event planners might claim you could track this all in a nice big spreadsheet with multiple pages and links out to file locations.

This is a valid problem that Jeremy brings up in his scenario and one I've thought about in the past when it comes to tying information from disparate applications about the same person. Outlook comes closes to doing what I'd like to see here but it gets help from Exchange. I was curious as to how Jeremy thought WinFS could help here and his thoughts were similar to what I'd first thought when I heard about WinFS, specifically he wrote

What does WinFS provide that will help?

  • A common storage engine, one unified namespace for storage of any application data on your machine.  Whether I use Outlook, AOL Communicator, or Notes, my emails can all be stored in WinFS.  (Yes, I understand that new versions of these apps will have to built…encouraging that is my job as evangelist.)
  • A set of common schemas, so that an email is an email is an email, no matter what app created it, and it always has a To:, From: and Subject: that I can access through the same API.
  • A data model that supports relationships, so that my event management app can specify “this email from Bill is related to the Longhorn Design Review event, as is this calendar appointment for next month”
  • A data model that supports extensions and meta-data on relationships, so that I not only say “this contact Jon is associated with this Design Review event”, but also “Jon is a speaker at this event” and “Jon is the author of this deck that he’ll present” and “Jon has not yet confirmed attendance at the party afterwards.”
  • Win32 file system access, so that even though files are stored in WinFS, applications can still get to their streams via a Win32 path

It seems Jeremy and I had the same thoughts about how WinFS could help in this regard. After thinking about the problem for a little bit, I realized that having all applications store similar information in a central repository brings a number of problems with it. The two main problems I can see are unreliable applications that cause data corruption and security. The example I'll use is imagine if RSS Bandit , SharpReader, and FeedDemon all stored their data in WinFS using the model that Jeremy describes above. This means all RSS/Atom feeds and configuration data used by each application are not stored in application specific folders as is done today but in a unified store of RSS items. Here are two real problems that need to be surmounted

  1. In the past bugs in RSS Bandit that led to crashes also caused corrupted feed files. It's one thing for bugs in an application to corrupt its data or configuration files but another for it to corrupt globally shared data. This is akin to the pain users and developers have felt when buggy apps corrupt the registry. The WinFS designers will have to account of this occurence in some way even if it is just coming up with application design guidelines.

  2. For feeds that require authentication, RSS Bandit stores the user name and password required to access the feed in configuration files. Having such data globally shared means that proper security precautions must be taken. Just because a user has entered a password in RSS Bandit doesn't mean they want it exposed to any other applications especially potentially malicious ones. Of course, this is no different from what can happen tpday in Windows (e.g. most modern viruses/worms search the file system for email address book files to locate new victims to email themselves to) but with the model being pushed by WinFS this becomes a lot easier. A way to share data between WinFS-aware applications in a secure manner is a must have.

None of these is an insurmountable problem but it does point out that things aren't as easy as one suspects at first glance at WinFS. I pass Mike Deem in the hallway almost daily, I should talk to him about this stuff more often. In the mean time I'll be sure to catch the various WinFS designers on the next episode of the .NET Show on MSDN.


 

Categories: Technology

February 25, 2004
@ 12:16 PM

In his post JDOM Hits Beta 10 Jason Hunter writes

According to my Palm Pilot calendar, we laid out the vision for JDOM on March 28th, 2000. I figure we'll ship before March 28, 2004. If we can ship 1.0 before it's been a full four years, I can just round down and call it three. :-)

What took it so long? Several things. I discovered XML is "fractally complex". At the highest level it appears only slightly complicated, but as you dig deeper you discover increasing complexity, and the deeper you go the more complicated it continues to become. Trying to be faithful to the XML standards while staying easy to use and intuitive was a definite challenge.

This is one of challenges I face in my day job designing XML APIs for the .NET Framework. The allure of XML and its related technologies is that they appear simple and straightforward but once one digs a little it turns out that everything isn't quite as easy as it seemed at first.

One of the drawbacks of this appearance of simplicity is that everyone thinks they can write an XML parser which leads to occurences such as what is described in this post by Shawn Farkas Creating a SecurityElement from XML  

The overhead of a full-fledged XML parser would be too much. Even if you accept the fact that we need a lightweight security XML object, we can't even provide utility methods on SecurityElement to convert back and forth System.Xml objects, since the CAS code lives in mscorlib.dll, and mscorlib cannot take a dependency on external DLL's. (Think of what would happen if mscorlib depended on System.Xml.dll, and System.Xml.dll depended on mscorlib ...). As if this weren't enough, there are at least 3 distinct XML parsers in v1.1 of the framework (System.Xml, SecurityElement, and a lightweight parser in mscoree.dll which handles parsing .config files ... this was actually optimized to be able to fit into no more than two pages of memory). Whidbey will be adding yet another parser to handle parsing ClickOnce manifests

One of the things I'm currently working on is coming up with guidelines that prevent occurences like System.Security.SecurityElement, a class that represents XML but does not interact well with the rest of the XML APIs in the .NET Framework, from happening again. This will be akin to Don Box's MSDN TV episode Passing XML Data in the CLR but will take the form of an Extreme XML article and a set of .NET Framework design guidelines.


 

Categories: XML

February 22, 2004
@ 09:08 PM

Yesterday my mom and I went on a train ride that is often billed as being a way for couples to spend a special occassion. The train was full of couples celebrating anniversaries, birthdays and other special occassions. Quite a number of couples were making out openly at the end of the train ride whose main features are a picturesque dinner on the train and a stop with a tour of a local winery.

One of the less romantic aspects of this train ride is that for the most part you have to share a table with another couple facing them. this means they get to overhear your conversation and interrupt yours. The couple we shared our table with were celebrating the guy's birthday and his girlfriend was treating him to a special day out that ended with the train ride. After we got back on the train from the winery tour the unexpected happened. They were engaged in conversation and he was comparing her favorably to ex-girlfriends, then all of a sudden he got down on one knee and pulled out a box with a ring in it. After a stunned silence she took it, said some words softly then said “I appreciate the sentiment but the timing is inappropriate” and handed it back. This was followed by her voicing her concerns about his ability to support them and him rattling of how much he made a month plus various bonuses, etc. I think it went downhill from there.

All through this I was staring out the window trying to make small talk with my mom but failing miserably. If and whenever I do end up proposing to someone I've definitely learned a thing or two about what not to do.


 

Categories: Ramblings

Jon Udell writes in is entry Heads, decks, and leads: revisited 

Yesterday, for example, Steve Gillmor told me that he's feeling overwhelmed by thousands of unread items in NetNewsWire. Yet I never feel that way. I suspect that's because I'm reading in batches of 100 (in the Radio UserLand feedreader). I scan each batch quickly. Although opinions differ as to whether or not a feed should be truncated, my stance (which I'm reversing today) has been that truncation is a useful way to achieve the effect you get when scanning the left column of the Wall Street Journal's front page. Of the 100 items, I'll typically only want to read several. I open them into new Mozilla tabs, then go back and read them. Everybody's different, but for me -- and given how newspapers work, I suspect for many others too -- it's useful to separate the acts of scanning and reading. When I'm done with the batch, I click once to delete all 100 items.

and in today's post entitled Different strokes he writes

I agree. In trying to illustrate a point about scanning versus reading, I'm afraid I fanned the flames of the newsreader-style versus browser-style debate. In fact, the two modes can be complementary. I just bought the full version of NetNewsWire, which exploits that synergy as Brent describes. So does FeedDemon, which this posting prompted me to re-explore.

This highlights a conflict between the traditional 3-pane aggregators that follow the mail or news reader model which implies that every post is important and should be read one by one and web-style aggregators like Radio Userland that present blogs in a unified web-based view reminiscent of an aggregated blog or newspaper. On the RSS Bandit wiki there's a wishlist item that reads

Newspaper view. A summery of unread feed items, formatted by a XSLT stylesheet and displayed as HTML/PDF. Inspired by Don Park. also here

which was originally added by Torsten. He never got around to adding this feature because he felt it wasn't that useful after all. I never implemented it because one would have to provide a way to interact with posts from this newspaper view (i.e. mark them as read or deleted, view comments, etc) which either translates to Javascript coding or running a local web server. Neither of the options was palatable.

This morning I downloaded FeedDemon to see how it got around these problems for its newspaper view. I found out that it does the obvious thing, it doesn't. From what I gather there is an option to 'mark all items in a channel as read' once you leave the channel. So once you close the newspaper view it assumes every post that showed up in it was read. A heavy-handed approach but it probably works for the most part.  

Looks like something else to add to the RSS Bandit TODO list.

I've been thinking that something like this is necessary after reading Robert Scoble's post 1296 newsfeeds +are+ sustainable where he wrote

Here's my workflow:

At about 5 p.m. every day I tell NewsGator to get me my feeds. It is downloading them in the background as I speak.

Then I open each folder that's bold...

Then I only read the headlines. I'm getting very good at ignoring headlines with subjects like "isn't my cat cute?" See, that's another productivity point. Robin probably assumes I read all the crap that people post. I don't. I only read those things that MIGHT be interesting. If I find a headline that's interesting, then I scan the article it is associated with. I don't read it. Just scan at that point. Usually that means reading the first paragraph and scanning the rest for later.

I've found that reading headlines isn't always the best way to find good stuff and wouldn't mind a way to quickly scan all the articles in a category that goes beyond eyeballing a bunch of headlines. However I'm going to avoid Jon Udell's advice about XHTML-izing all the HTML content in feeds is the way to get you there. Been there, done that, not going back. The approach used by FeedDemon is a step in the right direction and doesn't require absorbing the problems that  comes with trying to convert the ill-formed markup that typically shows up in feeds to XHTML.


 

Categories: RSS Bandit | XML

In his post entitled Back in the Saddle Don Box writes

My main takeaway was that it's time to get on board with Atom - Sam is a master cat herder and I for one am ready to join the other kittens. 

This is good news. Anyone who's read my blog probably can discern that I think the ATOM syndication format is a poorly conceived, waste of effort that unnecessarily fragments the website syndication world. On the other hand the ATOM API especially the bits about SOAP enabled clients are a welcome upgrade to the existing landscape of blog posting/editing APIs.

My experiences considerng how to implement the ATOM API in RSS Bandit have highlighted a one or two places where the API seems 'problematic' which actually point more to holes in the XML Web Services architecture than actual problems with the API. The two scenarios that come most readily to mind are

  1. Currently if a user would wants to post a comment to their blog using client software they need to configure all sorts of technical settings such as which API to use, port numbers, end point URLs and a lot more. For example, look at what one has to post to a dasBlog weblog from w.bloggar Ideally, the end user should just be able to point their client at their blog URL (e.g. http://www.25hoursaday.com/weblog) and it figures out the rest.

    The current ATOM specs describe a technique for discovering the web service end points a blog exposes which involves downloading the HTML page and parsing out all the <link> tags. I've disagreed with this approach in the past but the fact is that it does get the job done.

    What this situation has pointed out to me is that there is no generic way to go up to a website and find out what XML Web Service end points it exposes. For example, if you wanted to tell all the publiclly available Web Services provided by Microsoft you'd have to read Aaron Skonnard's A Survey of Publicly Available Web Services at Microsoft instead of somehow discovering this programmatically. Maybe this is what UDDI was designed for?

  2. Different blogs allow different syntax for posting comments. I've lost count of the amount of times I've posted a comment to a blog and wanted to provide a link but couldn't tell whether to just use a naked URL (http://www.example.com) or a hyperlink (<a href=“http://www.example.com“>example link</a>). Being that RSS Bandit has supported the CommentAPI for a while now I've constantly been frustrated by the inability to tell what kind of markup or markup subset the blog allows in comments. A couple of blogs provide formatting rules when one is posting a comment but there really is no programmatic way to discover this.

    Another class of capabilities I'd like to discover dynamically is which features a blog supports. For instance, the ATOM API spec used to have a 'Search facet' which was removed because it seemed to many people thought it'd be onerous to implement. What I'd have preferred would have been for it to be optional then clients could dynamically discover whether the ATOM end point had search capabilities and if so how rich they were.

    The limitation here is that there isn't a generic way to discover and enunciate the fine grained capabilities of an XML Web Service end point. At least not one I am familiar with.

It would be nice to see what someone like Don Box can bring to the table in showing how to architect and implement such a loosely coupled XML Web Service based system on the World Wide Web.


 

Categories: Technology | XML

February 21, 2004
@ 01:23 PM

Torsten and I have fixed about a dozen bugs since the v1.2.0.90 release and implemented one or two minor features. There are two major issues we'd like to tackle in the next few weeks then ship a minor release then work on the next major version of RSS Bandit. The issues are both performance related

  1. High Memory Consumption: We don't consume as much memory as the other free .NET Framework based aggregator, SharpReader, which I've seen consume over 100MB of memory but we do stay in the 30MB to 60MB range which is excessive. I'm pretty sure I have a good idea what the prime culprits are for the memory issues but currently can't think of a good way to reduce the memory consumption without removing or degrading some features. However our perf goals are to reduce those numbers by half in the next few weeks.

  2. Feeds with Lots of Posts Take too Long to Load: This is related to one of the culprits in the previous problem. If you are subscribed to a feed such as Weblogs @ ASP.NET which gets 50-100 posts a day which translates to about 1500-3000 posts a month then there is a perceptible slowdown in how long it takes to load the listview when you click on the feed. 

We'll be fixing bugs and implementing minor features along the way while getting the aforementioned performance issues under control. Once we are satisfied with the perf , we'll have a beta period and then ship a release. This should be within the next month.

After this there are a number of significant features we have slated such as NNTP support, subscription harmonization using SIAM, translations to multiple languages (German being the first) and better integration with IE (such as supporting the Favorites menu).  The release where these features show up will be in 2 or 3 months.

In the meantime, Torsten and I will be discussing RSS Bandit development in our blogs and on the RSS Bandit mailing list


 

Categories: RSS Bandit

Torsten figured out how to place background images in our XSLT styled themes for RSS Bandit this past week. As a celebration, we'll be including the following theme in the default install.

Personally, I'd have preferred Torsten's Christina Aguilera theme. :)


 

Categories: RSS Bandit

Daniel Cazzulino is writing about W3C XML Schema type system < - > CLR type system and has an informal poll at the bottom of his article where he writes

We all agree that many concepts in WXS don't map to anything existing in OO languages, such as derivation by restriction, content-ordering (i.e. sequence vs choice), etc. However, in the light of the tools the .NET Framework makes available to map XML to objects, we usually have to analyze WXS (used to define the structure of that very XML instance to be mapped) and its relation with our classes
In this light, I'm conducting a survey about developer's view on the relation of the XSD type system and the .NET one. Ignoring some of the more advanced (I could add cumbersome and confusing) features of WXS, would you say that both type systems fit nicely with each other?

I find the question at the end of his post which I highlighted to be highly tautological. His question is basically, “If you ignore the parts where they don't fit well together do the CLR and XSD type  system fit well together?”. Well if you ignore the parts where they don't then the only answer is YES. In reality many developers don't have the freedom to ignore parts of XSD they don't want to support especially when utilizing XML Web Services designed by others.

There are two primary ways one can utilize the XmlSerializer which maps between XSD and CLR types

  1. XML Serialization of Object State: In this case the developer is only interested in ensuring that the state of his classes can be converted to XML. This is a fairly simple problem because the expressiveness of the CLR is a subset of that of W3C XML Schema. Any object's state could be mapped to an element of complex type containing a sequence or choice of other nested elements that are either nested simple types or complex types.

    Even then there are limitations in the XmlSerializer which make this cumbersome such as the fact that it only serializes public fields but not public properties. But that is just a design decision that can be revisited in future releases.

  2. Conversion of XML to Objects: This is the scenario where a developer converts an XML schema to CLR objects to make them easier to program against. This is particularly common in XML Web Services scenarios which is why the XmlSerializer was originally designed. In this scenario the conversion tool has to contend with the breadth of features in the XML Schema: Structures and XML Schema: Datatypes recommendations.

    There are enough discrepancies between the W3C XML Schema type system and that of the CLR to fill a Ph.D thesis. I touched on some of these in my article XML Serialization in the .NET Framework such as

    Q: What aspects of W3C XML Schema are not supported by the XmlSerializer during conversion of schemas to classes?

    A: The XmlSerializer does not support the following:

    • Any of the simple type restriction facets besides enumeration.
    • Namespace based wildcards.
    • Identity constraints.
    • Substitution groups.
    • Blocked elements or types.

    After gaining more experience with working with the XmlSerializer and talking to a number of customers I wrote som more about the impedance mismatches in my article XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? specifically

    For usage scenarios where a schema is used to create strongly typed XML, derivation by restriction is problematic. The ability to restrict optional elements and attributes does not exist in the relational model or in traditional concepts of type derivation from OOP languages. The example from the previous section where the email element is optional in the base type, but cannot appear in the derived type, is incompatible with the notion of derivation in an object oriented sense, while also being similarly hard to model using tables in a relational database.

    Similarly changing the nillability of a type through derivation is not a capability that maps to relation or OOP models. On the other hand, the example that doesn't use derivation by restriction can more straightforwardly be modeled as classes in an OOP language or as relational tables. This is important given that it reduces the impedance mismatch which occurs when attempting to map the contents of an XML document into a relational database or convert an XML document into an instance of an OOP class

I'm not the only one at Microsoft who's written about this impedance mismatch or tried to solve it. Gavin Bierman, Wolfram Schulte and Erik Meijer wrote in their paper Programming with Circles, Triangles and Rectangles an entire section about this mismatch. Below are links to descriptions of a couple of the mismatches they found most interesting

The mismatch between XML and object data-models
     Edge-labelled vs. Node-labelled
     Attributes versus elements
     Elements versus complex and simple types
     Multiple occurrences of the same child element
     Anonymous types
     Substitution groups vs derivation and the closed world assumption
     Namespaces, namespaces as values
     Occurence constraints part of container instead of type
     Mixed content

There is a lot of discussion one could have about the impedance mismatch between the CLR type system and the XSD type system but one thing you can't say is that it doesn't exist or that it can be ignored if building schema-centric applications.

    In conclusion, the brief summary is that if one is mapping objects to XML for the purpose of serializing their state then there is a good match between the CLR & XSD type systems since the XSD type system is more expressive than the CLR type system. On the other hand, if one is trying to go from XSD to the CLR type system there are significant impedance mismatches some of which are limitations of the current tools (e.g. XmlSerializer could code gen range checks for derivation by restriction of simple types or uniqueness tests for identity constraints ) while others are fundamental differences between the XSD type system and object oriented programming such as the difference between derivation by restriction in XSD and type derivation.


     

    Categories: XML

    Recently ZDNet ran an article entitled Google spurns RSS for rising blog format where it stated

    The search giant, which acquired Blogger.com last year, began allowing the service's million-plus members to syndicate their online diaries to other Web sites last month. To implement the feature, it chose the new Atom format instead of the widely used, older RSS.

    I've seen some discussion about the fact that Google only provides feeds for certain blogs in the ATOM 0.3 syndication format  which is an interim draft of the spec that is part of an effort being driven by Sam Ruby to replace RSS and related technologies. When I first read this I ignored it because I didn't have any Blogger.com feeds that were of interest to me. This changed today. This afternoon I found out that Steve Saxon, the author of the excellent article XPath Querying Over Objects with ObjectXPathNavigator had a Blogger.com blog that only provided an ATOM feed. Being that I use RSS Bandit as my aggregator of choice I cannot subscribe to his feed nor can I use a large percentage of the existing news aggregators to read Steve's feed.

    What a find particularly stunning about Google's decision is that they have removed support for an existing, widely supported format for an interim draft of a format which  according to Sam Ruby's slides for the O'Reilly Emerging Technology Conference is several months away from being completed. An appropriate analogy for what Google has done would be like AOL abandoning support for HTML and changing all of its websites to use the May 6th 2003 draft of the XHTML 2.0 spec. It simply makes no sense.

    Some people, such as Dave Winer believe Google is engaging in such user unfriendly behavior for malicious reasons but given that Google doesn't currently ship a news aggregator there doesn't seem to be much of a motive there (Of course, this changes once they ship one).  I recently stumbled across an article entitled The Basic Laws of Human Stupidity which described the following 5 laws

    1. Always and inevitably everyone underestimates the number of stupid individuals in circulation.

    2. The probability that a certain person be stupid is independent of any other characteristic of that person.

    3. A stupid person is a person who causes losses to another person or to a group of persons while himself deriving no gain and even possibly incurring losses.

    4. Non-stupid people always underestimate the damaging power of stupid individuals. In particular non-stupid people constantly forget that at all times and places and under any circumstances to deal and/or associate with stupid people always turns out to be a costly mistake.

    5. A stupid person is the most dangerous type of person.

    The only question now is Is Google crazy, or crazy like a fox? and only time will tell the answer to that question.


     

    Categories: Technology