« jibjab.com | Main | test »

Comparing 2 techniques for compatible Schema evolution

As I advocated in XML extensibility and versioning, it's crucial to provide extensibility points to enable compatible evolution. This enables somebody, including the namespace name owner, to add content into messages without requiring a change in software that doesn't understand the extension.

The non-determinism constraint in XML Schema gives us a fair amount of heartburn because we can't have optional elements followed by XML Schema wildcards that allow elements in the optional elements namespace. I've shown the extension possibilities, their problems and potential XML Schema solutions in Providing Compatible Schema Evolution. The crux of the problem is that we can't add in an optional element (which preserves backwards compatibility) and keep the extensibility point (which preserves future forwards compatibility). The schema designer has to choose in the next version of the schema whether to add the optional element and dropping the extensibility point, or not adding in the optional element into the schema.

In my article, I described a technique that enables the determinism constraint to be worked around. A valid schema can be created that enables a next version schema to add in the optional element and has an extensibility point. Interestingly, XML Schema's appInfo element is an Extensibility element.

There is another solution, which is that the extensibility point can be separated from the optional element by inserting a pre-defined element before the extensibility point. We'll call this scenario the "Extensibility Marker". Dare Obasanjo wrote up an article called Designing Extensible, Versionable XML Formats that introduces this technique.

It seems to me that the trade-off is roughly between: simpler readability of extensions in the same namespace, or inclusion of 3rd party extensions in subsequent schemas. If you think that you will use new namespaces for your extensions and you want to write a Schema that includes these extensions, then you'll probably want to use the Extension Element technique. If you plan on doing your extensions in the same namespace and aren't very concerned about adding 3rd party components in your schema then you probably want to use the Marker element technique.

It might be possible to add another marker (say EndOfElement) to indicate the end of the other namespaced elements. This might enable the writing of Schemas using the Extensibility Marker technique for both same and other namespaced elements. But I haven't looked into this in detail and I'm simply guessing there might be a workaround.

Let's look at a detailed comparison of the techniques.

Extension Elements
The trick that I describe moves the extensibility point inside an XML element. When the extension is used, the extension is inside the extensibility element. As the extensibility element is known in the schema, there are 2 optional elements in the schema, rather than an optional element followed by a wildcard. The key to the trick is that the wildcard is a child of the Extension element. The extension is always a child of the extension element, so there is no confusion between the wildcard and any optional elements.

In V2 of the schema, an optional element is added by changing the reference to the same namespace extension type to be a reference type that contains the optional element followed by the same namespace extension type. This basically swaps the same namespace extensibility point for the Extensibility element followed by the optional element.

A V1 Name schema with extension elements for targetnamespace is:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.openuri.org/name/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:pns="http://www.openuri.org/name/" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="namelist">
<xs:complexType>
<xs:sequence>
<xs:element ref="pns:name" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:element name="name" type="pns:name"/>
<xs:complexType name="name">
<xs:sequence>
<xs:element name="first" type="xs:string"/>
<xs:element name="last" type="xs:string" minOccurs="0"/>
<xs:element name="Extension" type="pns:ExtensionType" minOccurs="0"/>
<xs:any namespace="##other" processContents="lax" maxOccurs="unbounded"/>
</xs:sequence>
<xs:anyAttribute/>
</xs:complexType>

<xs:complexType name="ExtensionType">
<xs:sequence>
<xs:any namespace="##targetnamespace" processContents="lax" maxOccurs="unbounded"/>
</xs:sequence>
<xs:anyAttribute/>
</xs:complexType>

</xs:schema>

And a list of names that is valid is

<?xml version="1.0" encoding="UTF-8"?>
<pns:namelist xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pns="http://www.openuri.org/name/">
<pns:name>
<pns:first>Dave</pns:first>
</pns:name>

<pns:name>
<pns:first>Dave</pns:first>
<pns:last>Orchard</pns:last>
</pns:name>

<pns:name>
<pns:first>Dave</pns:first>
<pns:last>Orchard</pns:last>
<midns:middle>Bryce</midns:middle>
</pns:name>

<pns:name>
<pns:first>Dave</pns:first>
<pns:last>Orchard</pns:last>
<pns:Extension>
<pns:middle>Bryce</pns:middle>
</pns:Extension>
</pns:name>

<pns:name>
<pns:first>Dave</pns:first>
<pns:last>Orchard</pns:last>
<pns:Extension>
<pns:middle>Bryce</pns:middle>
<pns:Extension>
<pns:prefix>Mr</pns:prefix>
</pns:Extension>
</pns:Extension>
</pns:name>
</pns:namelist>

Adding in a middle name produces the following Schema:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.openuri.org/name/" xmlns:pns="http://www.openuri.org/name/" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="namelist">
<xs:complexType>
<xs:sequence>
<xs:element ref="pns:name" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:element name="name" type="pns:name"/>
<xs:complexType name="name">
<xs:sequence>
<xs:element name="first" type="xs:string"/>
<xs:element name="last" type="xs:string" minOccurs="0"/>
<xs:element name="Extension" type="pns:middleExtension" minOccurs="0"/>
<xs:any namespace="##other"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="middleExtension">
<xs:sequence>
<xs:element name="middle" type="xs:string"/>
<xs:element name="Extension" type="pns:ExtensionType" minOccurs="0"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="ExtensionType">
<xs:sequence>
<xs:any namespace="##targetnamespace" processContents="lax" maxOccurs="unbounded"/>
</xs:sequence>
<xs:anyAttribute/>
</xs:complexType>
</xs:schema>

A good thing about this is that the instance document validates against both schemas.

In the scenario I wrote up, I only used this technique for extensions in the same namespace.

The technique can be used for extensions in other namespaces as well. The good folks at RIGs did this.

The key to getting around non-determinism is having a version 1.0 vocabularly define an element that will go before any extension and after any optional elements. In this case, the element is a parent of the extension.

Extension Marker
Instead of defining a same namespace extensibility element and other namespace extensibility element, we can define a same namespace marker element and other namespace marker element. These marker elements have no children, they are simply markers to separate out the different namespaced elements.

Instead of having 2 optional extensibility elements in the schema, the schema has a 2 nested sequences followed by the other namespace extensibility point. The outer sequence contains: a sequence containing the same namespace marker element and the same namespace extensibility point; and the other namespace name marker element.

If any extensions are added in the same namespace, then same namespace marker element and the other namespace marker element will be required to be inserted.

In V2 of the schema, optional elements are added by replacing the extensibility point with a sequence of the maker and an extensibility point.

The V1 schema using Marker elements is


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.openuri.org/name/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:pns="http://www.openuri.org/name/" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="namelist">
<xs:complexType>
<xs:sequence>
<xs:element ref="pns:name" maxOccurs="unbounded"/>
</xs:sequence>
<xs:anyAttribute/>
</xs:complexType>
</xs:element>

<xs:element name="name" type="pns:name"/>
<xs:complexType name="name">
<xs:sequence>
<xs:element name="first" type="xs:string"/>
<xs:element name="last" type="xs:string" minOccurs="0"/>
<xs:sequence minOccurs="0">
<xs:sequence maxOccurs="unbounded">
<xs:element ref="pns:sameNsExtensionMarker"/>
<xs:any namespace="##targetNamespace" maxOccurs="unbounded"/>
</xs:sequence>
<xs:element ref="pns:otherNsExtensionMarker"/>
</xs:sequence>
<xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:anyAttribute/>
</xs:complexType>

<xs:element name="sameNsExtensionMarker">
<xs:complexType/>
</xs:element>

<xs:element name="otherNsExtensionMarker">
<xs:complexType/>
</xs:element>

</xs:schema>

A document containing valid instances is:


<?xml version="1.0" encoding="UTF-8"?>
<pns:namelist xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pns="http://www.openuri.org/name/">
<pns:name>
<pns:first>Dave</pns:first>
</pns:name>

<pns:name>
<pns:first>Dave</pns:first>
<pns:last>Orchard</pns:last>
</pns:name>

<pns:name>
<pns:first>Dave</pns:first>
<pns:last>Orchard</pns:last>
<midns:middle>Bryce</midns:middle>
</pns:name>

<pns:name>
<pns:first>Dave</pns:first>
<pns:last>Orchard</pns:last>
<pns:sameNsExtensionMarker/>
<pns:middle>Bryce</pns:middle>
<pns:otherNsExtensionMarker/>
</pns:name>

<pns:name>
<pns:first>Dave</pns:first>
<pns:last>Orchard</pns:last>
<pns:sameNsExtensionMarker/>
<pns:middle>Bryce</pns:middle>
<pns:sameNsExtensionMarker/>
<pns:prefix>Mr</pns:prefix>
<pns:otherNsExtensionMarker/>
</pns:name>
</pns:namelist>

Adding the middle name into the Schema yields:


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.openuri.org/name/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:pns="http://www.openuri.org/name/" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="namelist">
<xs:complexType>
<xs:sequence>
<xs:element ref="pns:name" maxOccurs="unbounded"/>
</xs:sequence>
<xs:anyAttribute/>
</xs:complexType>
</xs:element>

<xs:element name="name" type="pns:name"/>
<xs:complexType name="name">
<xs:sequence>
<xs:element name="first" type="xs:string"/>
<xs:element name="last" type="xs:string" minOccurs="0"/>
<xs:sequence minOccurs="0">
<xs:sequence maxOccurs="unbounded">
<xs:element ref="pns:sameNsExtensionMarker"/>
<xs:element name="middle" type="xs:string"/>
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element ref="pns:sameNsExtensionMarker"/>
<xs:any namespace="##targetNamespace" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:sequence>
<xs:element ref="pns:otherNsExtensionMarker"/>
</xs:sequence>
<xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

<xs:element name="sameNsExtensionMarker">
<xs:complexType/>
</xs:element>

<xs:element name="otherNsExtensionMarker">
<xs:complexType/>
</xs:element>
</xs:schema>

Comparing techniques

Either of these techniques are imminently suitable for providing multiple versions of schemas that add in optional elements and retain extensibility. Neither of them is particularly elegant or appealing compared to our intuition of how simple it could be. Both schemas are roughly the same complexity and the instance documents are of similar complexity.

The extension element technique has the advantage that extensions can be included in the schema. If it is ever possible that the v2 schema will need to refer to another namespace, then the Extension element is the only way to do this.

The extensibility marker does have appeal by allowing flat elements and 1 less element for version 3 and later. The downside is that each new element has to be inserted into the element's parent as a sequence + the element. This means that the type increases in length ( by roughly 4 items) every time an optional element is added.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on July 28, 2004 2:51 PM.

The previous post in this blog was jibjab.com.

The next post in this blog is test.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34