Dare Obasanjo has published an article on designing extensible, versionable xml formats, and it heavily references an article that I published on Versioning XML vocabularies.
I've had a few people ask me about the differences between the articles, and what are the reasons for the differences. I thought I would compare and contrast the articles. Firstly, let me say that I'm really pleased to see Dare's article and I was glad to be a reviewer. In general, I think there is a lot of overlap between the articles. That is goodness, that we agree and hopefully the industry can adopt common guidelines will help everybody. And where we disagree, I think that the differences aren't that significant compared to the areas we do agree.
Where we agree is roughly that vocabularies should/must:
- be designed for backwards and forwards compatibility,
- provide for extensibility of attributes and elements,
- Use Must Ignore unknown extensions rule as the simplest way for forwards compatibility,
- Use or provide a must Understand rule to over-ride Must Ignore,
- Use a Schema design technique if it is desirable to write compatible V2 Schemas.
We both state that using a new namespace for new constructs suffers from a significant problem that forwards compatible schemas cannot be written if all new constructs are done in a new namespace. This is because of the limitations of the schema wildcard, in that it can't differentiate between anything finer than "##other" namespace. If a namespace owner uses another namespace for an extension, and they want to retain extensibility, then they can't write a new schema because of non-determinism. I offered an Extension Element technique, and Dare offers a delimiter technique, for writing schemas that are forwards and backwards compatible.
There are 5 main areas of differences.
The need for Schema V2 for forwards compatible changes
I believe that writing Schemas for v2 is important. Therefore version additions should be done in the same namespace and an Schema technique for compatible versioning should be used. Dare contrasts the techniques of new namespace versus re-use namespace for versions. I think that there is a trade-off to be made, but I wanted to argue strongly in favour of the ability to write a V2 Schema. I don't think Dare hits the point hard enough that you can't write a V2 Schema containing compatible changes if you use new namespaces for new constructs approach. He describes the problem of forwards compatibility as a secondary drawback, and then doesn't say that a forwards compatible simply can't be written using just wildcards. This problem is solved for using re-using namespace name for new constructs by either of our Schema techniques.
Forwards compatible Schema technique
I believe the technique for writing a schema that can be versioned in a forwards compatible way is the most significant difference in the articles. Dare uses a "delimiter" technique which puts the target namespace extensions inline, and I advocate an Extension element. I like Dare's technique a lot and don't care which technique is used. My technique was generated last summer, and finally got published in December. It's clear that my technique influenced a number of people, and if it helped spark a better technique, then that's a win in my books. From now on, I plan on talking about both techniques in the context of writing schemas for forwards compatibility.
Why is this hard
I spent some time talking about why the various techniques must be used, and how XML Schema could make our life easier. I think that disseminating this material has materially helped the world, as I point out what the very recent Schema 1.1 WD has done in my blog entry on Schema 1.1 and Versioning.
Necessity of mustUnderstand for re-using namespace names
Dare states that a mustUnderstand model is not necessary for re-use namespace name approach (what he calls Using Version Extensibility Points). I disagree with this. It is true that a namespace name owner does not need the mustUnderstand model when mustIgnore is default - as they can simply change the namespace name to indicate a requirement. The mustUnderstand is still necessary for 3rd parties to indicate that their extensions are required. I had spent some time in my original article talking about languages as containers (ie soap), but I'd had to prune that material.
Dare also suggests that a "Must Fault on unknown extensions in target ns" is a good rule. But this precludes a namespace owner from making a forwards compatible change! If an ns owner adds an element and the client breaks if they haven't seen it, then that is a breaking change.
Hence why I believe that the Must Ignore unknown extensions rule applies to all namespaces, and an explict must Understand rule is necessary to allow 3rd parties to indicate mandatory extensions.
MUSTs versus SHOULDs
The use of MUST versus SHOULD in rules or guidance. In general, I used SHOULD and Dare used MUST. I stand by my original "SHOULDS". In general, a SHOULD basically means a MUST unless there is some compelling reason not to. FWIW, I flip-flopped about 3 times on this issue in my article.
For example, the provide processing model is a SHOULD. Dare wrote it as a MUST. The problem that I had with MUST was that many many authors do not provide a processing model. Clearly they don't have to provide a processing model. I interpreted SHOULD in this context to mean "do x if you want to get benefit y".
The rough comparison of my rules versus Dare's:
Mine: 1. Allow Extensibility rule: Languages SHOULD be designed for extensibility.
Dare: XML formats should be designed to be extensible.
Dare added a rule that I thought was obvious
Extensions must not use the namespace of the XML format.
Then I made the use of Extension elements part of a rule:
2. Any Namespace rule: The extensibility point SHOULD allow for extension in any namespace. For XML Schema applications, the extensibility point SHOULD be an element that allows for extension in the target namespace and a wildcard that allows for extension in any other namespace. and
3. Full Extensibility rule: All XML Elements SHOULD allow for element extensibility after element definitions, and allow any attributes.
Dare: All XML elements in the format should allow any extension attributes, and elements with complex content should allow for extension elements as children.
4. Provide Processing Model Rule: Languages SHOULD specify a processing model for dealing with extensions.
Dare: Formats that support extensibility must specify a processing model for dealing with extensions.
I provided a definition of "Must Ignore" in rules 5-7, and Dare listed it as an option. Roughly the same thing.
5. Must Ignore Rule: Document receivers MUST ignore any XML attributes or elements in a valid XML document that they do not recognize.
Namespace management is next:
8. Re-use namespace names Rule: If a backwards compatible change can be made to a specification, then the old namespace name SHOULD be used in conjunction with XML's extensibility model.
Dare: If the next version of a format is backward compatible with previous versions, then the old namespace name must be used in conjunction with XML's extensibility model.
9. New namespaces to break Rule: A new namespace name is used when backwards compatibility is not permitted, that is software MUST break if it does not understand the new language components.
Dare: A new namespace name must be used when backward compatibility is not permitted. That is, software must break if it does not understand the new language components.
10. Provide mustUnderstand Rule: Container languages SHOULD provide a "mustUnderstand" model for dealing with optionality of extensions that override a default Must Ignore Rule.
Dare: Formats should specify a mustUnderstand model for dealing with backward-incompatible changes to the format that don't change the namespace name. .
I have a rule about determinism, which really is a MUST:-)
11. Be Deterministic rule: Use of wildcards MUST be deterministic. Location of wildcards, namespace of wildcard extensions, minOccurs and maxOccurs values are constrained, and type restriction is controlled.
It's clear that the rules are quite similar, and a designer won't go wrong if they follow either sets of rules.
As I said before, I'm really glad to see Dare's article. I hope it helps even more people understand how to write xml languages that can be extensible and versionable, so that we can get closer to the dream of building loosely coupled systems.