The wildcard construct that is the linchpin of XML Schema's extensibility model has a single facet for specifying what is allowed: a namespace. It has 3 special namespaces: ##any (for any namespace), ##other (for namespaces other than the targetnamespace) and ##targetnamespace.
The use of namespaces as the level of granularity seems rational and obvious. But the problem is that from a versioning perpsective, namespaces aren't nearly good enough. We simply can't write a multi-namespaced language with extensibility and write a Schema. You need a "pseudo-schema", which is what WSDL 2.0 has done.
Even fixing schema with low priority wildcards doesn't solve this problem, because it doesn't allow exclusion of the known namespaced items from the wildcard. It ends up allowing things that you'd want to exclude.
In my December 2003 XML.com versioning article, I first suggested "Additionally, a wildcard that only allowed elements that had not been defined -- effectively other namespaces plus anything not defined in the target namespace -- is another useful model." I need to update this slightly to be "anything not defined in the current schema's namespaces - the target namespace plus any imported namespaces".
What we need is a wildcard construct that allows only extensions that are not in the current schema. This means allowing elements in the current namespaces but aren't known, allowing any new namespaces, and excluding elements that are in the current namespaces.
This solution allows us to get around the UPA rule, and allows extensibility in between elements
In my entry on Providing Compatible Schema Evolution, I mentioned a couple of possible wildcard expression solutions that Schema could examine: Allowing undefined elements only, and Namespace name variability.
At first, I thought these were minor problems and we could live without them. But I'm becoming even more convinced that ignoring the wildcard variability will still severely limit the deployment of complicated multi-ns schemas.
The core of the problem is that we have 1 construct that we can use for 3rd party extensibility and for 1 first party extensibility - commonly thought of as versioning. If I own a namespace and I change it, we commonly think of this as versioning. If somebody adds in a 3rd party instance, we think of this as extensibility.
But wait! That's not the way it really goes. What often happens - and take a look at almost all the Web services specs - is that multiple namespaces embody the meaning of a given language. WSDL is a great example of multiple ns in a vocabulary. The WSDL language has about 3 different namespaces. And it probably would like a schema for this. We can't write a full XML Schema for WSDL 2.0 because the wildcard can't differentiate between the namespaces that are part of WSDL 2.0 and those that aren't.
Please realize what this means: If you write a language that uses multiple namespaces, you can't write a Schema and have extensibility. Period.
Interestingly, the Schema could know what was part of vocabulary because has modularity mechanisms like xs:import. If we had a wildcard that allowed us to say things like "allow extensions in namespaces that aren't the known ns" - where known meant targetns+imports, or "allow extensions in namespaces that aren't part of WSDL 2.0", then we could write an XML schema.
There are some requirements that are being called out here:
1) A schema can be constructed from multiple namespaces.
2) A language may evolve within a given namespace name.
3) A language may evolve by adding more namespace names and their names.
4) A schema can allow extensibility of things outside the current language definition. This does not differentiate between first party or 3rd party extensions.
5) Extensibility should not allow known terms to be included at the extensibility point. By definition, these aren't extensions.
What we need is a different "set" for the extensibility point. We need a wildcard that effectively says "allow anything that is not defined in my language, ie the set of QNames for my elements".
The simplest solution I can think of to meet this requirement is adding another attribute to the xs:any. This attribute could be called "elements". It has a special value ##unknown. Thus what I'd like to see is
<xs:any elements="##unknown"/>.
The definition of ##unknown is any element that is not defined by the composite schema that is generated as a result of all the import processing.
I thought that maybe a special value for namespace would work, but this is really about a set of namespace names and potential subsets of each of the namespace names.
I also thought about the usual pattern that a vocabulary will typically consist of a set of namespace names that may be relatable - this was the namespace variability option I listed - but that doesn't seem as useful as the "known" versus "unknown" options.
I think this solution gets around the UPA rule: because there's no confusion between something that's known and unknown. Either the schema has a definition, or it doesn't.
And it allows extensibility between elements, because the UPA rule is still clear.
I hope the Schema WG will include these scenarios - multi-ns, extensions in those multi-ns, excluding known elements from extensibility, and allowing extensibility between elements - into their deliberations.
Leave a comment