The Holy Grail in XML<->Object Mapping Technologies

March 28, 2004

@ 07:59 PM

I was reading a post by Rory Blyth where he points to Steve Maine's explanation of the benefits of Prothon (an object oriented programming language without classes). He writes

One quote from Steve's post that has me thinking a bit, though, is the following:

The inherent extensibility and open content model of XML makes coming up with a statically typed representation that fully expresses all possible instance documents impossible. Thus, it would be cool if the object representation could expand itself to add new properties as it parsed the incoming stream.

I can see how this would be cool in a "Hey, that's cool" sense, but I don't see how it would help me at work. I fully admit that I might just be stupid, but I'm honestly having a hard time seeing the benefit. Right now, I'm grabbing XML in the traditional fashion of providing the name of the node that I want as a string key, and it seems to be working just fine.

The problem solved by being able to dynamically add properties to a class in the case of XML<->object mapping technologies is that it allows developers to program against aspects of the XML document in a strongly typed manner even if they are not explicitly described in the schema for the XML document.

This may seem unobvious so I'll provide an example that illustrates the point. David Orchard of BEA wrote a schema for the ATOM 0.3 syndication format. Below is the fragment of the schema that describes ATOM entries

<xs:complexType name="entryType"> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="link" type="atom:linkType"/> <xs:element name="author" type="atom:personType" minOccurs="0"/> <xs:element name="contributor" type="atom:personType" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="id" type="xs:string"/> <xs:element name="issued" type="atom:iso8601dateTime"/> <xs:element name="modified" type="atom:iso8601dateTime"/> <xs:element name="created" type="atom:iso8601dateTime" minOccurs="0"/> <xs:element name="summary" type="atom:contentType" minOccurs="0"/> <xs:element name="content" type="atom:contentType" minOccurs="0" maxOccurs="unbounded"/> <xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="xml:lang" use="optional"/> <xs:anyAttribute/> </xs:complexType>

The above schema fragment produces the following C# class when the .NET Framework's XSD.exe tool is run with the ATOM 0.3 schema as input.

/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://purl.org/atom/ns#")]
public class entryType {

    /// <remarks/>
    public string title;

    /// <remarks/>
    public linkType link;

    /// <remarks/>
    public personType author;

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("contributor")]
    public personType[] contributor;

    /// <remarks/>
    public string id;

    /// <remarks/>
    public string issued;

    /// <remarks/>
    public string modified;

    /// <remarks/>
    public string created;

    /// <remarks/>
    public contentType summary;

    /// <remarks/>
    [System.Xml.Serialization.XmlElementAttribute("content")]
    public contentType[] content;

    /// <remarks/>
    [System.Xml.Serialization.XmlAnyElementAttribute()]
    public System.Xml.XmlElement[] Any;

    /// <remarks/>
    [System.Xml.Serialization.XmlAttributeAttribute(Namespace="http://www.w3.org/XML/1998/namespace")]
    public string lang;

    /// <remarks/>
    [System.Xml.Serialization.XmlAnyAttributeAttribute()]
    public System.Xml.XmlAttribute[] AnyAttr;
}

As a side note I should point out that David Orchard's ATOM 0.3 schema is invalid since it refers to an undefined authorType so I had to remove the reference from the schema to get it to validate.

The generated fields highlighted in bold show the problem that the ability to dynamically add fields to a class would solve. If programming against an ATOM feed using the above entryType class then once one saw an extension element, you'd have to fallback to XML processing instead of programming using strongly typed constructs. For example, consider Mark Pilgrim's RSS feed which has dc:subject elements which are not described in the ATOM 0.3 schema but are allowed due to the existence of xs:any wildcards. Watch how this complicates the following code which prints the title, issued date and subject of each entry.

foreach(entryType entry in feed.Entries){ Console.WriteLine("Title: " + entry.title); Console.WriteLine("Issued: " + entry.issued); string subject = null; //find the dc:subject foreach(XmlElement elem in entry.Any){ if(elem.LocalName.Equals("subject") && elem.NamespaceUri.Equals("http://purl.org/dc/elements/1.1/"){ subject = elem.InnerText; break; } } Console.WriteLine("Subject: " + subject); }

As you can see, one minute you are programming against statically and strongly typed C# constructs and the next you are back to checking the names of XML elements and programming against the DOM. If there was infrastructure that enabled one to dynamically add properties to classes then it is conceivable that even though the ATOM 0.3 schema doesn't define the dc:subject element one would still be able program against them in a strongly typed manner in generated classes. So one could write code like

foreach(entryType entry in feed.Entries){ Console.WriteLine("Title: " + entry.title); Console.WriteLine("Issued: " + entry.issued); ); Console.WriteLine("Subject: " + entry.subject); }

Of course, there are still impedance mismatches to resolve like how to reflect namespace names of elements or make the distinction between attributes vs. elements in the model but having the capabilities Steve Maine describes in his original post would improve the capabilities of the XML<->Object mapping technologies that exist today.

Categories: XML

« On Providing Beta Feedback for RSS Bandi... | Home | The Problem With Public Bug Databases »

Sunday, 28 March 2004 20:30:24 (GMT Daylight Time, UTC+01:00)

Why does it make the date fields a string?

Robert Sayre

Sunday, 28 March 2004 20:42:15 (GMT Daylight Time, UTC+01:00)

They are defined to be of type atom:iso8601dateTime which is a union of xs:date and xs:dateTime. Considering that (a) the CLR doesn't provide a natural way to model union types and (b) there is no CLR type that exactly matches xs:date, mapping the type to xs:string is the best that the XmlSerializer can do.

Dare Obasanjo

Monday, 29 March 2004 01:57:03 (GMT Daylight Time, UTC+01:00)

Hey, Dare -

First of all, thanks for responding. I'm hoping to get my head wrapped around this stuff.

So, what I don't understand is this: I'm understanding better now what Steve was talking about, but I still don't really understand the benefit. If you're interacting with a strongly typed representation of the XML, then that's something that benefits you when you're *coding*, but I don't see how it would help at runtime, which is when you would expect to encounter the unexpected.

Sorry - the problems are tough for me to formulate in sentences because I'm still learning what the problems are :)

I'm also confused because it seems to me that if you're adding properties dynamically at runtime, then don't you also have to modify the consumer of the data? Doesn't the calling class have to somehow "learn" about the added property so that it can make use of it? How does this work?

Please forgive me if I sound stupid, but I have to admit that I'm having some difficulty with getting this...

Rory

Monday, 29 March 2004 03:24:02 (GMT Daylight Time, UTC+01:00)

Rory,
There are some implicit assumptions I am making about how this would be implemented. Read Doug Purdy's blog post at http://www.douglasp.com/2003/05/13.html#a288 to see what I'm assuming. Basically, the assumption is that the programming language is dynamically and strongly typed (like SmallTalk) instead of statically and strongly typed (like C#). That way exceptions for accessing non-existent members are thrown at runtime instead of being compiler errors.

For the code I wrote in the second attempt at printing the information in an ATOM feed, the process I assume takes place is that

1.) The language is dynamically typed so property and field accesses aren't checked until runtime.

2.) When the XML serializer sees extension elements in the XML it generates properties or fields for them in the entryType class.

3.) When the access of the subject property is made, all is OK because it exists in the class having been added by the XML serializer.

Of course, no system like this exists today and it is unlikely you'd see anything like this in the .NET Framework in the near future.

Dare Obasanjo

Monday, 29 March 2004 03:25:27 (GMT Daylight Time, UTC+01:00)

Having support for dynamic properties would really simplify the conversion from objects -> XML, too.

Having an AnyElements array on the type is good, but you can't put things in there in any strongly typed way. You're pretty much stuck with XmlElement as an API. It would be much easier if the entity class itself were "expando", so you could simply do:

entity.ContentRating = "Great!"

and have it show up in the appropriate wildcard element, even though the ContentRating field wasn't explicity declared on the Entry type.

Steve Maine

Monday, 29 March 2004 18:48:05 (GMT Daylight Time, UTC+01:00)

Dare -

"2.) When the XML serializer sees extension elements in the XML it generates properties or fields for them in the entryType class.

3.) When the access of the subject property is made, all is OK because it exists in the class having been added by the XML serializer."

OK - Same ideas as in your post, but different words - works every time :)

I guess I just needed to hear it a few different ways, but I get it now.

Thanks :)

Rory

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for The Holy Grail in XML<->Object Mapping Technologies - Dare Obasanjo's weblog