I've been spending some time over the past couple of months thinking about Web services and Web APIs. Questions like when a web site should decide to expose an API, what form the API should take and what technologies/protocols should be used are topics I've rehashed quite a lot in my head. Recently I came to the conclusion that if one was going to provide a Web service that is intended to be consumed by as many applications as possible, then one should consider exposing the API using multiple protocols. I felt that at least two protocols should be chosen SOAP over HTTP (for the J2EE/.NET crowd) and Plain Old XML (POX) over HTTP (for the Web developer crowd).

However, I've recently started spending a bunch of time writing Javascript code for various Windows Live gadgets and I've begun to appreciate the simplicity of using JSON over parsing XML by hand in my gadgets. I've heard similar comments echoed by co-workers such as Matt who's been spending a bunch of time writing Javascript code for Live Clipboard and Yaron Goland who's one of the minds working on the Windows Live developer platform. JSON has similar goals to XML-RPC and W3C XML schema in that it provides a platform agnostic way to transfer data which is encoded as structured types consisting of name<->value pairs and collections of name<->value pairs. It differs from XML-RPC by not getting involved with defining a mechanism for remote procedure calls and from W3C XML schema by being small, simple and focused.

Once you start using JSON in your AJAX apps, it gets pretty addictive and it begins to seem like a hassle to parse XML even when it's just plain old XML such as RSS feeds not complex crud like SOAP packets. However being an XML geek, there are a couple of things I miss from XML that I'd like to see in JSON especially if it's usage will grow to become as widespread as XML is on the Web today. Yaron Goland feels the same way and has started a series of blog posts on the topic.

In his blog post entitled Adding Namespaces to JSON Yaron Goland writes

The Problem

If two groups both create a name "firstName" and each gives it a different syntax and semantics how is someone handed a JSON document supposed to know which group's syntax/semantics to apply? In some cases there might be enough context (e.g. the data was retrieved from one of the group's servers) to disambiguate the situation but it is increasingly common for distributed services to be created where the original source of some piece of information can trivially be lost somewhere down the processing chain. It therefore would be extremely useful for JSON documents to be 'self describing' in the sense that one can look at any name in a JSON document in isolation and have some reasonable hope of determining if that particular name represents the syntax and semantics one is expecting.

The Proposed Solution

It is proposed that JSON names be defined as having two parts, a namespace name and a local name. The two are combined as namespace name + "." + local name to form a fully qualified JSON name. Namespace names MAY contain the "." character. Local names MUST NOT contain the "." character. Namespace names MUST consist of the reverse listing of subdomains in a fully qualified DNS name. E.g. org.goland or com.example.bigfatorg.definition.

To enable space savings and to increase both the readability and write-ability of JSON a JSON name MAY omit its namespace name along with the "." character that concatenated it to its local name. In this case the namespace of the name is logically set to the namespace of the name's parent object. E.g.

{ "org.goland.schemas.projectFoo.specProposal" :
"title": "JSON Extensions",
"author": { "firstName": "Yaron",
"com.example.schemas.middleName":"Y",
"org.goland.schemas.projectFoo.lastName": "Goland",
}
}

In the previous example the name firstName, because it lacks a namespace takes on its parent object's namespace. That parent is author which also lacks a namespace so recursively author looks to its parent specProposal which does have a namespace, org.goland.schemas.projectFoo. middleName introduces a new namespace "com.example.schemas", if the value was an object then the names in that object would inherit the com.example.schemas namespace. Because the use of the compression mechanism is optional the lastName value can be fully qualified even though it shares the same namespace as its parent. com.example.taxonomy

My main problem with the above approach is echoed by the first comment in response to Yaron's blog post; the above defined namespace scheme isn't completely compatible with XML namespaces. This means that if I have a Web service that emits both XML and JSON, I'll have to use different namespace names for the same elements even though all that differs is the serialization format. Besides the disagreement on the syntax of the namespace names, I think this would be a worthwhile addition to JSON.

In another blog post entitled Adding Extensibility to JSON Data Formats Yaron Goland writes

The Problem

How does one process JSON messages so that they will support both backwards and forwards compatibility? That is, how does one add new content into an existing JSON message format such that those who do not understand the extended content will be able to safely ignore it?

The Proposed Solution

In the absence of additional information providing guidance on how to handle unrecognized members a JSON processor compliant with this proposal MUST ignore any members whose names are not recognized by the processor.

For example, if a processor was expecting to receive an object that contained a single member with the name "movieTitle" and instead it receives an object with multiple members including "movieTitle", "producer" and "director" then the JSON processor would, by default, act as if the "producer" and "director" members were not present.

An exception to this situation would be a member named "movie" whose value is an object where the semantics of the members of that object is "the local name of the members of this object are suitable for presenting as titles and their values as text under those titles". In that case regardless of the processor's direct knowledge of the semantics of the members of the object (e.g. the processor may actually know about movieTitle but not "producer" or "directory") the processor can still process the unrecognized members because it has additional information about how to process them.

This requirement does not apply to incorrect usage of recognized names. For example, if the definition of an object only allowed a single "movieTitle" member then having two "movieTitle" members is simply an error and the ignore rule does not apply.

This specification does not require that ignored members be removed from the JSON structure. It is quite possible that other processors who will deal with the message may recognized members the current processor does not. Therefore it would make sense to let unrecognized members remain in the JSON structure so that others who process the structure may benefit from the extended information.

Definition: Simple value - A value of a type other than array or object.

If a JSON processor encounters an array where it had expected to encounter a simple value the processor MUST retrieve the first simple value in the array and treat that as the value it was expecting and ignore the other elements in the array.

Again, it looks like I'm going to go ahead and parrot the same feedback as a commenter to the original blog post. Defining an extensibility model where simple types can be converted to arrays in a future version seems like overkill and unnecessary complexity. It's not like it's that hard to another field to the type. The other thing I wondered about this blog post is that it seemed to define a problem set that doesn't really exist. It's not like there are specialized JSON parsers that barf if they see a field that they don't understand in widespread use. Requiring that the fields of various types be defined up front or else barfing when encountering undefined fields over the wire is primarily a limitation of statically typed languages and isn't really a problem for dynamic languages like JavaScript. Or am I missing something?