One significant problem with version #s in documents is that it is unclear what is being identified with the version #. The common misconception is that a version # indicates the version of the document for purposes of processing. That may be true, but in many cases it is an incomplete and useless assertion.
What is actually identified is usually the latest version of the processor that produced the document, and a promise of what version of documents the processor will accept in response. It is often a protocol version identifier, not just a format identifier. HTTP is a good example. The HTTP specification says "The protocol versioning policy is intended to allow the sender to indicate the format of a message and its capacity for understanding further HTTP communication, rather than the features obtained via that communication."
Because HTTP is a request-response protocol, the capacity for further HTTP communication is the crucial "version" information conveyed. Most documents that might have version #s do not fit that "future capacity" use case, they are just documents that do not have a protocol.. There are cases where a document format is combined with a protocol, such as Atom. These combination protocol/format case are fairly rare and not generally applicable.
In the case of just a document, it is unclear what a version # identifies. Imagine name language version 1 is first, last, and extensions; and compatible version 2 is first, last, optional middle, and extensions. If a name contains first and last, should the identifier be version 1 or 2? If a name that has first, last, middle, should the identifier by version 1 or 2? There is little gained by identifying the difference between the versions. Notice also the problem of "identifying" the version: Should the oldest or the newest version be the identifier?
Usually, the document contains the version number of the latest version of the language that the producing application understands. Thus the "newest" version of the identifier is used, even if the document itself is valid under older versions of the language.
Things all work fine if the producer and consumer are at the same version, or even if the consumer understands the older and the newer version. But in forwards compatibility, what is a consumer that doesn't understand the newer version to do, and how does it know that it can treat the document as if it was an older version?
The fundamental problem with a version # in a document is that it doesn't provide for a given document to be valid under more than one version. What we really need is to be able to indicate a "space of versions" that a given document is valid under, whether that's a list or regexp or whatever.
For years now, I've been asserting that the best way to do this with XML is to re-use the namespace name for compatible versions and to ignore things it doesn't understand. That is, there won't be a change in the namespace name for compatible changes. The space of versions is the single namespace name. The consumer won't need to distinguish between versions and it will ignore any extra items it doesn't understand.
Another approach is to list all the versions of the language that the instance is valid under. Marc de Graauw wrote an interesting article in XML.com on how to use version numbers in XML.
What's a shame is that xml namespaces makes this problem of associating a given name with multiple namespaces almost impossible. I'll shortly post something on what this could have looked like.