Examining wildcards for versioning

Author: Dave Orchard, Jan 8th 2004

Introduction

A number of specifications use wildcards for extensibility and evolution. This is a good thing. However, there is a prevalent design choice that has significant limitations. The use of wildcards at the same level as element content precludes many of the common validation features authors need for loosely coupled systems. The specific problem is that validation of extensions cannot be correctly done, leading to documents passing validity that should not.

To illustrate this problem, let us start with a person description that contains a name. It contains two extensibility points using the common wildcard with targetNamespace="##other":

<xs:complexType name="person"> <xs:sequence> <xs:element name="name" type="tns:name" /> <xs:any namespace="##other" /> </xs:sequence> </xs:complexType> <xs:complexType name="name"> <xs:sequence> <xs:element name="first" type="xs:string" /> <xs:element name="last" type="xs:string" minOccurs="0" /> <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType>

The author of the person namespace decides that they want to extend their type by adding a middle name to the name and a city to the person type. Hence they create 2 new schema types, for middle and city, which look like:

<!-- Schema for midns --> <xs:simpleType name="middle"> <xs:restriction base="xs:string"/> </xs:simpleType> <!-- Schema for cityns --> <xs:simpleType name="city"> <xs:restriction base="xs:string"/> </xs:simpleType>

They can create documents that have a middle name and city in it. Now what does the author do about the schema for the extended person?

No person schema change

Some authors believe that namespaces shouldn't change, and revisions to a schema for a given namespace is outlawed. The processing model that their validator uses is that if an extension is allowed in the schema and an element appears in an instance document, then they vaildate that element against any schema they can find. In this case, the middle element is in the midns namespace prefix so validation of midns:middle elements can occur.

The problem is that there is no way for the person schema author who has extended the name with middle to say where the middle is allowed and where it is not allowed. The person schema author cannot validate the location of middle or city elements. Using the previous processing model, the following document is valid

<pns:person> <pns:name> <pns:first>Dave</pns:first> <pns:last>Orchard</pns:last> <cityns:city>Vancouver</cityns:city> </pns:name> <midns:middle>B</midns:middle> </pns:person>

person schema change

Another option is that they could update the schema to specify the middle name as an optional element inside the name. This would prevent the city from occuring in the name, but it would also prevent any other extension inside the name as they have to lose the wildcard. The reason is the pesky unique particle attribution constraint, which means that an optional element in a given ns can't occur before a wildcard that might match that ns. The following schema is illegal:

<xs:complexType name="person"> <xs:sequence> <xs:element name="name" type="tns:name" /> <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="name"> <xs:sequence> <xs:element name="first" type="xs:string" /> <xs:element name="last" type="xs:string" minOccurs="0" /> <xs:element ref="midns:middle" minOccurs="0"/> <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType>

XML Schema, mostly because of the Unique Particle Attribution rule, does not have a mechanism for refining wildcards to say which optional elements are allowed and which are not allowed in a particular wildcard while retaining extensibility.

The author has really only 3 options when adding elements in this design:

  1. Make no schema change: then placement of any extension cannot be validated
  2. Make a schema change to add the optional content: then the schema loses extensibility and hence preclude any other parties extension
  3. Make a schema change to add the content as required: then the schema loses backwards compatibility

None of these options are desirable. The options in the design exist because the fundamental problem with using wildcards outside of an Extension element for versioning, is that a schema cannot correctly validate extensions and retain compatibility. This is because XML schema lacks the ability to validate optional wildcard extensions. So when a namespace owner tries to use wildcards at the same level as elements for versioning, they run into this problem.

Extension elements solve part of versioning problem

In contrast, the solution that I advised in my article on versioning xml languages does allow an author to add optional elements (such as middle and city), constrain where these elements can occur, and retain backwards and forwards compatibility. The design is a schema that has Extension elements to contain any extensions. In any subsequent version, the schema author overlays a new type on the older type by replacing a wildcard with the new elements. This is the linchpin of this design: To allow forward compatibility, a wildcard is used inside an Extension element. In a subsequent revision of the specification, the wildcard where extension occurs is replaced with an optional element (optional elements preserve backwards compatibility) and a new Extension element is placed after the optional element. There are two options for namespaces within the wildcard element, either the targetnamespace or other namespaces. Given that the wildcard is going to be replaced with an element if the namespace owner makes a change, they can use the targetnamespace and effectively "promise" how they will make changes. If they use ##other as many solutions do, any revision they do to the schema that retains forwards and backwards compatibility (replacing the wildcard with an optional element) will end up invalidating everybody else's extension. Thus it is more desirable to use the ##targetnamespace as this promises that the namespace owner can only invalidate their own extensions!

To illustrate the benefits of an Extensibility element and compare ##targetnamespace and ##other, we make a schema that uses Extensibility elements for person:

<xs:element name="person" type="pns:person"/> <xs:complexType name="person"> <xs:sequence> <xs:element name="name" type="pns:name" /> <xs:element name="Extension" type="pns:OtherNSExtensionType" minOccurs="0" /> <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="name"> <xs:sequence> <xs:element name="first" type="xs:string" /> <xs:element name="last" type="xs:string" minOccurs="0" /> <xs:element name="Extension" type="pns:ExtensionType" minOccurs="0" /> <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="ExtensionType"> <xs:sequence> <xs:any processContents="lax" minOccurs="1" maxOccurs="unbounded" namespace="##targetnamespace"/> </xs:sequence> <xs:anyAttribute/> <xs:complexType name="OtherNSExtensionType"> <xs:sequence> <xs:any processContents="lax" minOccurs="1" maxOccurs="unbounded" namespace="##other"/> </xs:sequence> <xs:anyAttribute/> </xs:complexType>

Using this model, the namespace owner extends the person schema with the middle name in the person namespace and a city in the address namespace, which is:

<xs:element name="person" type="pns:person"/> <xs:complexType name="person"> <xs:sequence> <xs:element name="name" type="pns:name"/> <xs:element name="Extension" type="pns:cityExtension" minOccurs="0"/> <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="name"> <xs:sequence> <xs:element name="first" type="xs:string"/> <xs:element name="last" type="xs:string" minOccurs="0"/> <xs:element name="Extension" type="pns:middleExtension" minOccurs="0"/> <xs:any namespace="##other" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="middleExtension"> <xs:sequence> <xs:element name="middle" type="xs:string"/> <xs:element name="Extension" type="pns:ExtensionType" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:complexType name="cityExtension"> <xs:sequence> <xs:element ref="cityns:city"/> <xs:element name="Extension" type="pns:OtherNSExtensionType" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:complexType name="ExtensionType"> <xs:sequence> <xs:any namespace="##targetnamespace" minOccurs="1" processContents="lax" maxOccurs="unbounded"/> </xs:sequence> <xs:anyAttribute/> </xs:complexType> <xs:complexType name="OtherNSExtensionType"> <xs:sequence> <xs:any processContents="lax" minOccurs="1" maxOccurs="unbounded" namespace="##other"/> </xs:sequence> <xs:anyAttribute/>

The following document is correctly invalid to the newer schema:

<pns:person> <pns:name> <pns:first>Dave</pns:first> <pns:last>Orchard</pns:last> <pns:Extension> <cityns:city>Vancouver</cityns:city> </pns:Extension> </pns:name> <pns:Extension> <pns:middle>B</pns:middle> </pns:Extension> </pns:person>

And the following document is correctly valid to both the original and the new schema:

<pns:person> <pns:name> <pns:first>Dave</pns:first> <pns:last>Orchard</pns:last> <pns:Extension> <pns:middle>B</pns:middle> </pns:Extension> </pns:name> <pns:Extension> <cityns:city>Vancouver<cityns:city> </pns:Extension> </pns:person>

The technique of using an extensibility element with a targetnamespace wildcard has the advantage over other namespace wildcard because it means that a single namespace and related schema can be updated with the new type information and allowing others to retain their ability to extend the instance. There is a "master" schema that a namespace owner can see that controls their documents, though there obviously may be schema modularization. Using a single namespace is preferable to multiple namespaces.

A final observation: The SOAP specification follows this model of containing wildcards within Extensibility elements (specifically head and body), so this technique should not be new to Web services developers. And the WSDL specification has the difficult task of determining how to create schemas that constrain the wildcard elements. This is so difficult that WSDL 1.1 does not express optional header blocks, and WSDL 2.0 is on the same path.

Conclusion

This article shows that using a wildcard at the same level as elements for compatible evolution of a schema does not allow correct validation using updated schemas whilst retaining compatible evolution. The advocated technique, using an Extension element, allows the correct validation of new and old documents under newer and older schemas. And using ##targetnamespace in the wildcard allows for a simpler mechanism than ##other. There are some drawbacks to the Extension element technique because of further limitations of wildcards (they still are more lenient than we need them to be), and these are addressed separately.