« CBC and Hockey in San Diego | Main | hXML: the XML Microformat »

What do Version Identifiers identify?

One significant problem with version #s in documents is that it is unclear what is being identified with the version #. The common misconception is that a version # indicates the version of the document for purposes of processing. That may be true, but in many cases it is an incomplete and useless assertion.

What is actually identified is usually the latest version of the processor that produced the document, and a promise of what version of documents the processor will accept in response. It is often a protocol version identifier, not just a format identifier. HTTP is a good example. The HTTP specification says "The protocol versioning policy is intended to allow the sender to indicate the format of a message and its capacity for understanding further HTTP communication, rather than the features obtained via that communication."

Because HTTP is a request-response protocol, the capacity for further HTTP communication is the crucial "version" information conveyed. Most documents that might have version #s do not fit that "future capacity" use case, they are just documents that do not have a protocol.. There are cases where a document format is combined with a protocol, such as Atom. These combination protocol/format case are fairly rare and not generally applicable.

Documents
In the case of just a document, it is unclear what a version # identifies. Imagine name language version 1 is first, last, and extensions; and compatible version 2 is first, last, optional middle, and extensions. If a name contains first and last, should the identifier be version 1 or 2? If a name that has first, last, middle, should the identifier by version 1 or 2? There is little gained by identifying the difference between the versions. Notice also the problem of "identifying" the version: Should the oldest or the newest version be the identifier?

Usually, the document contains the version number of the latest version of the language that the producing application understands. Thus the "newest" version of the identifier is used, even if the document itself is valid under older versions of the language.

Things all work fine if the producer and consumer are at the same version, or even if the consumer understands the older and the newer version. But in forwards compatibility, what is a consumer that doesn't understand the newer version to do, and how does it know that it can treat the document as if it was an older version?

The fundamental problem with a version # in a document is that it doesn't provide for a given document to be valid under more than one version. What we really need is to be able to indicate a "space of versions" that a given document is valid under, whether that's a list or regexp or whatever.

For years now, I've been asserting that the best way to do this with XML is to re-use the namespace name for compatible versions and to ignore things it doesn't understand. That is, there won't be a change in the namespace name for compatible changes. The space of versions is the single namespace name. The consumer won't need to distinguish between versions and it will ignore any extra items it doesn't understand.

Another approach is to list all the versions of the language that the instance is valid under. Marc de Graauw wrote an interesting article in XML.com on how to use version numbers in XML.

What's a shame is that xml namespaces makes this problem of associating a given name with multiple namespaces almost impossible. I'll shortly post something on what this could have looked like.

Comments (1)

Thanks for the kind words on my article.

David: "What's a shame is that xml namespaces makes this problem of associating a given name with multiple namespaces almost impossible."

I don't think my approach requires associating a name with multiple namespaces. If you take your approach, use the same namespace for compatible (mayIgnore) changes and a new namespace for incompatible (mustUnderstand) changes, you require receiver to understand all namespaces used (where I require understanding one or more versions) and you give the receiver the liberty to ignore unknowns _within a known namespace_ (which I do by not requiring the version they were defined in).

It won't work for content (which is practical in the case of code values) as in my last example, but other than that the mechanics seem the same.

(The namespace approach also will not work if the "mayIgnore" changes come from a different namespace, i.e. I include some FOAF elements in my L2 which I believe may safely be ignored.)

Looking forward to the next post you mention.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on April 19, 2007 11:47 AM.

The previous post in this blog was CBC and Hockey in San Diego.

The next post in this blog is hXML: the XML Microformat.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34