Extending and Versioning XML Languages with XML Schema

Introduction

XML is designed for the creation of languages based upon self-describing markup.  The inevitable evolution of these languages, by adding, deleting, and changing parts, is called versioning. Making versioning work in practice is one of the most difficult problems in computing, with a long history of failed attempts.  Arguably, the Web rose dramatically in popularity because evolution and versioning were built into HTML and HTTP headers. Both languages provide explicit extensibility points and rules for understanding extensions that enabled the decentralized extension and versioning of the languages. 

XML Namespaces provide an ideal mechanism for identifying versions of languages, and all XML schema languages – such as W3C XML Schema – provide for controlled extensibility.  

This article describes techniques to achieve more effective loose coupling between systems by providing a means for backwards- and forwards-compatible changes to occur when systems evolve.  These techniques are designed to allow compatible changes with or without schema propagation.  A number of questions, design patterns and rules are introduced to enable versioning in XML vocabularies, making use of XML Namespaces and XML Schema constructs.  This includes rules for working with languages that provide an extensible container model, notably SOAP. 

The collective set of guidance is called the “Must Ignore” pattern of extensibility.  Strangely, the “Must Ignore” pattern for HTML tags and HTTP headers that significantly helped the Web’s adoption has not been widely adopted by XML practitioners.  This article aims to rectify that situation within the constraints of current Schema validation environments.  This article is permanently available [1] and is an update of the XML.com article on Versioning XML Languages [17].

Defining Compatibility

 

FOLDOC [2] provides definitions of backwards and forwards compatibility. This article will reprise those definitions and focus on the exchange of document instances.  The terms consumer and producer are used in relation to document and message oriented exchanges.  Web services readers can translate this article’s use of producer to sender, consumer to receiver, and instance to message. 

Backwards compatibility means that a newer version of a consumer can be rolled out in a way that does not break existing producers.   A producer can send an older version of a message to a consumer that understands the new version and still have the message successfully processed.  Forwards compatibility means that a newer version of a producer can be rolled out in a way that does not break existing consumers. Of course the older consumer will not implement any new behavior, but a producer can send a newer version of an instance and still have the instance successfully processed.

In other words, backwards compatibility means that existing documents can be used by updated consumers, and forwards compatibility means that newer documents can be used by existing consumers.  Another way of thinking of this is related to messages exchanges with producers on the left and consumers on the right.  Backwards compatibility is where the right side (consumer) is updated and forwards compatibility is where the left side (producer) is updated, shown below

 

 

Figure 1 – Evolution of Producers and/or Consumers

Some typical backwards- and forwards- compatible changes:
- adding optional components (element(s) and/or attribute(s))
- adding optional content to a component’s content model (such as adding an enumeration)

Some typically incompatible changes:
- changing the meaning or semantics of existing components
- adding required components
- removing required components
- restricting a components content model (such as changing a choice to a sequence)

The costs associated with introducing changes that are not backward- or forward-compatible are often very high, typically requiring deployed software to be updated to accommodate the newer version, or the deployment, management and related costs of running multiple instances. 

Compatibility is defined for the producer and consumer of an individual instance.  However, most Web service specifications provide definitions of inputs and outputs.  In these definitions of compatibility, a Web service that updates its output message schema is considered a newer producer.  This simply reverses the producer/consumer terminology of input instances when applying compatibility definitions to output instances.  If a Web service updates the schema of the output message, then it is “sending” a newer version of the message, hence it is considered a “producer”.

Language Questions

Having defined compatibility, the choices facing a language designer can be described. 

Can 3rd parties extend(version) the language?  It is rarely desirable to prevent 3rd parties from extending languages on their own but it does happen.  An example may be a tightly constrained security environment where distributed authoring is considered a “bug” rather than a feature.

Can 3rd parties extend the language in a compatible way?  If so, a substitution mechanism, such as simply ignoring unknown extensions, is required for forwards compatibility.  

Can 3rd parties extend(version) the language in an incompatible way?  If so, then incompatible changes can be done as an over-ride of the substitution mechanism (such as a must understand model or extension) or it can even be the default. For example, WS-Security committee desired that 3rd parties can only provide incompatible extensions.  . Unlike most languages, a security language has unique requirements where the consequences of ignored data can be severe.  They accomplished this by specifying that all extensions are required to be understood and there is no substitution mechanism.

Can the designer extend the language in a compatible way?  As with 3rd parties compatible extensions, a substitution mechanism for the designer’s extensions is required for forwards compatibility.

A question that does not need to be asked is: “Can the designer extend(version) the language in an incompatible way?” They can always do this by using new namespace names, element names or version numbers.

Is the vocabulary a stand-alone language or an extension of another vocabulary? A part of this question is whether the language depends on another language?  This determines which, if any, facilities are provided for the language and what must be provided. An example is SOAP headers can use the soap:mustUnderstand attribute and processing model even though the contents of the SOAP headers are independent languages from SOAP.

What Schema language(s)?  This guides the language design as some features, particularly extensibility, must be planned for in V1 and various features may be incompatible across different languages. An example is writing a V2 compatible Schema in XML Schema requires special design (shown later), which is not required in a schema language such as RelaxNG.

Should extensions or versions be expressible in the Schema language?  The ability to write a schema for extensions or versions is directly affected by the schema design and the compatibility desires.

Language Decisions

Upon answering these questions, there are some key decisions that a language developer makes, whether they are consciously made or not.

Schema language design choices or constraints.  If the language can be extended in a compatible way, then a few specific schema design choices must be followed.

Wildcards are used to provide extensibility in XML Schema. If revisions to the Schema are to support substitution, specific schema designs must be used in conjunction with the wildcard. The main choices are: provide wildcards, provide Extension elements, or provide delimiter elements.  Extension and delimiter elements are described in the new components in existing or new namespaces section.   If Extension/delimiter elements are not provided, then a compatible V2 Schema cannot be written.

Substitution Mechanism.  Forwards compatibility can only be achieved by providing a substitution mechanism for Version 2 instances or Version 1 extensions to V1 without knowledge of V2.   A V1 consumer must be able to transform any instances, such as V1 + extensions, to a V1 instance in order to process the instance.  The “Must Ignore unknown” rule is a simple substitution mechanism.  This rule says that any extensions are “ignored”.  Using it, a V1 + extensions document is transform into a V1 document by ignoring the extensions.  Others substitution mechanisms exist, such as the fallback model in XSLT.

Component identification.  The identification of components into language versions or extensions has a variety of general mechanisms related to namespaces.  These are detailed in the Versioning section.

Identification of incompatible extensions.  The identification of versions is covered by language identification, but 3rd parties cannot arbitrarily change versions or change namespaces.  They may need a mechanism to indicate that an extension is an incompatible change.  A couple of mechanisms are a “Must Understand” identifier (such as a flag or list of required namespaces) or requiring that extensions are in substitution groups.

Identifying and Extending Languages

Designing extensibility into languages typically results in systems that are more loosely coupled.  Extensibility allows authors to change instances without going through a centralized authority, and may allow the centralized authority greater opportunities for versioning.  The common characteristic of a compatible change is the use of extensibility.

A supreme example of the benefits of extensibility is HTML.  The first version of HTML was designed for extensibility; it said that “unknown markup” may be encountered.  An example of this in action is the addition of the IMG tag by the Mosaic browser team. 

The first rule introduced in this article relating to extensibility is: 

1.    Allow Extensibility rule: Languages SHOULD be designed for extensibility.

A fundamental requirement for extensibility is to be able to determine the language of elements and attributes.  XML Namespaces [13] provide a mechanism for associating a URI with an XML element or attribute name, thus specifying the language of the name.  This also serves to prevent name collisions.

HTML did not have the ability to distinguish between the namespaces of extensions.  This meant that authors could produce the same element name but with different interpretations, and software would have no way of determining which interpretation is applicable.  This is a great part of the motivation to move from HTML to the XML vocabulary of HTML, XHTML.

W3C XML Schema [14] provides a mechanism called a wildcard, <xs:any>, for controlling where elements from certain namespaces are allowed.  The wildcard indicates that elements in specified namespaces are allowed in instance documents where the wildcard occurs.  This allows for later extension of a schema in a well-defined manner.  Consumers of extended documents can identify and, depending upon its processing model, safely ignore the extensions they don't understand. 

<xs:any> uses the namespace attribute to control what namespaces extension elements can come from. The most interesting values for this attribute are: ##any, which means one can extend the schema using an element from any possible namespace; ##other, which only allows extension elements from namespaces other than the target namespace of the schema; and ##targetnamespace, which only allows extension elements from the target namespace of the schema. 

<xs:any> uses the processContents attribute to control how a XML parser validates extended elements.    Permissible methods include “lax” - validate any elements from supported namespaces but ignore all other elements, “strict” – validate all elements, and “skip” – validate no elements.  This article recommends “lax” validation, as it is the most flexible and is the typical choice for Web services specifications.

The main goal of the "Must Ignore" pattern of extensibility is to allow backwards and forwards compatible changes to documents. 

Example

Suppose that you have designed a language for handling personal information. The personal information consists of a “Name” element. The first version of the Name contains a “first” and a “last” element.  Our preference would be to have an extensibility style of ##any.  An XML Schema “name” type that uses this is:

<xs:complexType name="name">
  <xs:sequence>
    <xs:element name="first" type="xs:string"/>
    <xs:element name="last" type="xs:string"/>
    <xs:any namespace="##any" processContents="lax" 
            minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:anyAttribute/>
</xs:complexType>

Example 1 – A name schema using ##any for extensibility

However, the determinism constraint of XML Schema, described in more detail later, prevents this from working.  The problem arises in a version when an optional element is followed by a wildcard.  In this example, this occurs when an optional element is added and extensibility is still desired. This is an ungentle introduction to the difference between extensibility and versioning.  An optional middle name added into a subsequent version is a good example.  Consumers should be able to continue processing if they don’t understand an additional optional middle name, and we want to keep the extensibility point in the new version.  We can write a schema that contains the optional middle name and a wildcard for extensibility.  The following schema is roughly what is desired using wildcards, but it is illegal because of the determinism constraint:

<xs:complexType name="name">
  <xs:sequence>
    <xs:element name="first" type="xs:string"/>
    <xs:element name="last" type="xs:string"/>
    <xs:element name="middle" type="xs:string" minOccurs="0"/>
    <xs:any namespace="##any" processContents="lax" 
            minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:anyAttribute/>
</xs:complexType>

Example 2 – An illegal schema type for a backwards compatible version of the name schema

Since the above pattern does not work, we need to create a design pattern than enables roughly the equivalent in order to achieve the original goals. 

All compatible changes in new namespaces

The most common solution is to put all new components, either extensions or compatible versions, in a namespace different than the original namespace, that is the "##other" namespace option of wildcards.  We show a complete Schema instance:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/1" 
      xmlns:name="http://www.openuri.org/name/1"> 

  <xs:complexType name="name">
    <xs:sequence>
      <xs:element name="first" type="xs:string"/>
      <xs:element name="last" type="xs:string"/>
      <xs:element name="middle" type="xs:string" minOccurs="0"/>
      <xs:any namespace="##other" processContents="lax" 
            minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>

</xs:schema>

Example 3 – New components in new namespace(s) schema Version 1

The language designer and 3rd parties can now only use different namespaces for their versions.  For allowing new extensions in the same namespace, the author must create an extension type that allows extensions in the same namespace.  The extension type should be used only for future compatible extensions in the same namespace.  We need two more rules to allow proper versioning of XML language definitions.  The reader is urged to keep in mind that all of these restrictions on behavior are a consequence of the W3C's XML Schema design and are unnecessary in other schema languages like RelaxNG. First the rule for namespaces:

2.    Allow Extensions in Other Namespace rule: The extensibility point SHOULD at least allow for extension in other namespaces.

The rule for allowing extensibility:

 

3.    Full Extensibility rule: All XML Elements SHOULD allow for element extensibility after element definitions, and allow any attributes.

In general, an extension can be defined by a new specification that makes a normative reference to the earlier specification and then defines the new element. No permission should be needed from the authors of the specification to make such an extension.  In fact, the major design point of XML namespaces is to allow decentralized extensions.  The corollary is that permission is required for extensions in the same namespace.  A namespace has an owner; non-owners changing the meaning of something can be harmful.

Attribute extensions do not have non-determinism issues because the attributes are always unordered and the model group for attributes uses a different mechanism for associating attributes with schema types than the model group for elements.

Understanding Extensions

Ideally, producers should be able to extend existing XML documents with new elements without consumers having to change existing implementations. Extensibility is one step towards this goal, but achieving compatibility also requires a processing model for the extensions.  The behavior of software when it encounters an extension should be clear.  For this, we introduce the next rule: 

4.    Provide Processing Model Rule: Languages SHOULD specify a processing model for dealing with extensions.

The simplest processing model that enables compatible changes is to ignore content that is not understood.  This rule is:

 

5.    Must Ignore Rule: Document consumers MUST ignore any XML attributes or elements in a valid XML document that they do not recognize.

This rule does not require that the elements be physically removed; only ignored for processing purposes.  There is a great deal of historic usage of the Must Ignore rule.  HTML 1, 2 and 3.2 follow the Must Ignore rule as they specify that any unknown start tags or end tags are mapped to nothing during tokenization.  HTTP 1.1 [7] specifies that a consumer should ignore any headers it doesn't understand: "Unrecognized header fields SHOULD be ignored by the recipient and MUST be forwarded by transparent proxies."  The Must Ignore rule for XML was first standardized in the WebDAV specification RFC 2518 [6] section 14 and later separately published as the Flexible XML Processing Profile [3].

There are two broad types of Must Ignore rules for dealing with extensions, either ignoring the entire tree or just the unknown element.  The rule for ignoring the entire tree is:

6.    Must Ignore All Rule: The Must Ignore rule applies to unrecognized elements and their descendents in data-oriented formats.

For example, if a message is received with unrecognized elements in a SOAP header block, they must be ignored unless marked as “Must Understand” (see Rule 10 below). Note that this rule is not broken if the unrecognized elements are written to a log file. That is, “ignored” doesn’t mean that unrecognized extensions can’t be processed; only that they can’t be the grounds for failure to process.

Other applications may need a different rule as the application will typically want to retain the content of an unknown element, perhaps for display purposes.  The rule for ignoring the element only is:

7.    Must Ignore Container Rule: The Must Ignore rule applies only to unrecognized elements in presentation-oriented formats.

This retains the element descendents in the processing model so that they can still affect interpretation of the document, such as for display purposes.

Ignoring content is a simple solution to the problem of substitution.  In order to achieve a compatible evolution, the newer instances of a language must be transformable (or substitutable) into older instances.  Object systems typically call this “polymorphism”, where a new type can behave as the old type. 

Other substitution models have been successfully deployed.  One such model is a fallback model, where alternate elements are provided if the consumer does not understand the extension.  XSLT 2.0 provides such a model.  Another model is that a transform from the new type to the old type is made available, either by value or reference.

As desirable as compatible evolution often is, sometimes a language may not want to allow it.  In this model, a consumer will generate a fault if it finds a component it doesn’t understand.  An example might be a security specification where a consumer must understand each and every extension.  This suffers from the significant drawback that it does not allow compatible changes to occur in the language, as any changes require both consumer and producer to change.

Versioning

A language designer decides how new versions of their language, as well as extensions, are related to previous versions.  They decide how to use namespace names, component names for their language, as well as possibly introducing versioning-specific components such as version identifiers and incompatible extension identifiers.  When a new version of a language is required, the author must make a decision about the namespace name for names in the new language.

Version identification has traditionally been done with a decimal separating the major versions from the minor versions, ie “8.1”, “1.0”.  Often the definition of a “major” change is that it is incompatible, and the definition of a “minor” change is that it is forwards- and/or backwards - compatible.  Usually the first broadly available version starts at “1.0”.  A compatible version change from 1.0 might be identified as “1.1” and an incompatible change as “2.0”.  It should be noted that this is idealistic as there abundant cases where this system does not hold.  New major version identifiers are often aligned with product releases, or incompatible changes identified as a “minor” change.  A good example of an incompatible changed identified as a minor change is XML 1.1.  XML 1.0 processors cannot process all XML 1.1 documents because XML 1.1 extended XML 1.0 where XML 1.0 does not allow such extension. 

Version Identification Strategies

There are a large variety of version identification designs.  A few of the most common are listed below and described in more detail later.
1) all components in new namespace(s) for each version, ie version 1 consists of namespaces a + b, version 1.1 consists of namespaces c + d; or version 1 consists of namespace a, version 1.1 consists of namespace b.
2) all new components in new namespace(s) for each compatible version, ie version 1 consists of namespaces a + b; version 1.1 consists of namespaces a + b + c; version 2.0 consists of namespaces d + e.
3) all new components in existing or new namespace(s) for each compatible version, ie version 1 consists of namespace a, version 1.1 consists of namespace a, version 2 consists of namespace b; or version 1 consists of namespace a, version 1.1 consists of namespace a + b.
4) all new components in existing or new namespace(s) for each version and a version identifier, ie version 1 consists of namespace a + b + version attribute “1”, version 2 consists of namespace c + d + version attribute “2”.

Whatever the design chosen, the language designer must decide the component name, namespace name, and any version identifier for new and all existing components.  The trade-offs between the decisions relate to the importance of:

- Supporting Compatible evolution. 

- namespaces for identifying compatible components.  Changing namespace names is typically a very invasive change

- A complete Schema for the language.  We will see how some designs preclude full Schema description

- Use of generic XML and namespace only (precluding vocabulary specific versions) tools.

Elaborating on these designs is illustrative.  The first option, new namespace(s) for each version, typically preclude compatible evolution, so we will not describe this in detail.

Version Strategy: all new components in new namespace(s) for each compatible version (#2)
Schema example 3 showed a schema that allowed new components in new namespace(s), so the following names would be valid:

<name xmlns="http://www.openuri.org/name/1">
  <first>Dave</first>
  <last>Orchard</last>
</name>
 
<name xmlns="http://www.openuri.org/name/1">
  <first>Dave</first>
  <last>Orchard</last>
  <middle>Bryce</middle>
</name>
 
<name xmlns="http://www.openuri.org/name/1">
  <first>Dave</first>
  <last>Orchard</last>
  <pref1:prefix xmlns:pref1="http://www.openuri.org/name/pref/1">Mr.</pref1:prefix>
</name>
 
<name xmlns="http://www.openuri.org/name/1">
  <first>Dave</first>
  <last>Orchard</last>
  <pref2:prefix xmlns:pref2="http://www.example.org/name/pref/1">Mr.</pref2:prefix>
</name>

Example 4 – New components in new namespace(s) instances

The 2nd example shows the use of the optional middle name in the name namespace.  The 3rd and 4th example show an additional prefix element in 2 different namespace names.  The first prefix, the 3rd example, comes from a namespace name that is in the same domain as the name element’s namespace name.  The 4th example shows a complete different namespace name for the prefix.  It is probable that the pref1:prefix was created by the name author, and the pref2:prefix was created by a 3rd party.

Using XML Schema, the name owner has 3 fairly unappealing options for the v2 schema for name and prefix, listed below and detailed subsequently:
1) optional prefix, extensibility retained, but name type does not refer to prefix;
2) optional prefix, extensibility is lost, name type refers to prefix;
3) required prefix, extensibility retained, name type refers to prefix but compatibility is lost;

If they leave the prefix as optional and retain the extensibility point, the best schema that they can write is:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/1" 
      xmlns:name="http://www.openuri.org/name/1"> 

  <xs:complexType name="name">
    <xs:sequence>
      <xs:element name="first" type="xs:string"/>
      <xs:element name="last" type="xs:string"/>
      <xs:element name="middle" type="xs:string" minOccurs="0"/>
      <xs:any namespace="##other" processContents="lax" 
            minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>

</xs:schema>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/pref/1" 
      xmlns:pref="http://www.openuri.org/name/pref/1"> 

  <xs:element name="prefix" type="xs:string"/>

</xs:schema>

Example 5 – New components in new namespace(s) schema V2, no change to name type

This is not a very helpful XML Schema change.  The problem is that they cannot insert the reference to the optional pref:prefix element in the name schema and retain the extensibility point because of the aforementioned Non-Determinism Constraint. 

The core of the problem is that there is no mechanism for constraining the content of a wildcard.  For example, imagine that ns1 contains foo and bar.  It is not possible to take the SOAP schema – an example of a schema with a wildcard - and require that ns1:foo element must be a child of the header element and ns1:bar must not be a child of the header element using just W3C XML Schema constructs.   Indeed, the need for this functionality spawned some of the WSDL functionality. 

They could decide to lose the extensibility point (option #2), such as

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/1" 
      xmlns:name="http://www.openuri.org/name/1"
      xmlns:pref="http://www.openuri.org/name/pref/1">

  <xs:complexType name="name">
    <xs:sequence>
      <xs:element name="first" type="xs:string"/>
      <xs:element name="last" type="xs:string"/>
      <xs:element name="middle" type="xs:string" minOccurs="0"/>
      <xs:element ref="pref:prefix" minOccurs="0"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>

</xs:schema>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/pref/1" 
      xmlns:pref="http://www.openuri.org/name/pref/1"> 

  <xs:element name="prefix" type="xs:string"/>

</xs:schema>
 

Example 6 – New components in new namespace(s) schema V2, no extensibility

The final option, #3, is adding required prefix.  They must indicate the change is incompatible.  A new namespace name for the name element can be created, shown below

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/2" 
      xmlns:name="http://www.openuri.org/name/2"> 

  <xs:complexType name="name">
    <xs:sequence>
      <xs:element name="first" type="xs:string"/>
      <xs:element name="last" type="xs:string"/>
      <xs:element name="middle" type="xs:string" minOccurs="0"/>
      <xs:element name="prefix" type="xs:string"/>
      <xs:any namespace="##other" processContents="lax" 
            minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>

</xs:schema>

Example 7 – New components in new namespace(s) schema V2, incompatible change

However, this breaks compatibility which is often very undesirable.

The downsides of the 3 options for new components in new namespace name(s) design have been described.  Additionally, the design can result in specifications and namespaces that are inappropriately factored, as related constructs will be in separate namespaces. 

Version Strategy: all new components in existing or new namespace(s) for each version and a version identifier(#4)

Using a version identifier, the name instances would change to show the version of the name they use, such as:

<name xmlns="http://www.openuri.org/name/1" version="1.0">
  <first>Dave</first>
  <last>Orchard</last>
</name>
 
<name xmlns="http://www.openuri.org/name/1" version="1.0">
  <first>Dave</first>
  <last>Orchard</last>
  <middle>Bryce</middle>
</name>
 
<name xmlns="http://www.openuri.org/name/1" version="1.1">
  <first>Dave</first>
  <last>Orchard</last>
  <pref1:prefix xmlns:pref1="http://www.openuri.org/name/pref/1">Mr.</pref1:prefix>
</name>
 
<name xmlns="http://www.openuri.org/name/1" version="1.0">
  <first>Dave</first>
  <last>Orchard</last>
  <pref2:prefix xmlns:pref2="http://www.example.org/name/pref/1">Mr.</pref2:prefix>
</name>

<name xmlns="http://www.openuri.org/name/1" version="2.0">
  <first>Dave</first>
  <last>Orchard</last>
  <pref1:prefix xmlns:pref1="http://www.openuri.org/name/pref/1">Mr.</pref1:prefix>
</name>

Example 8 – New components in existing or new namespace(s) with version identifier instances

The last example shows that the prefix is now a mandatory part of the name.  As with Design #2, the schema for the optional prefix cannot fully express the content model. A schema for the mandatory prefix is

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/1" 
      xmlns:name="http://www.openuri.org/name/1"
      xmlns:pref="http://www.openuri.org/name/pref/1">

  <xs:complexType name="name">
    <xs:sequence>
      <xs:element name="first" type="xs:string"/>
      <xs:element name="last" type="xs:string"/>
      <xs:element name="middle" type="xs:string" minOccurs="0"/>
      <xs:element ref="pref:prefix"/>
      <xs:any namespace="##other" processContents="lax"
            minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>

</xs:schema>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/pref/1" 
      xmlns:pref="http://www.openuri.org/name/pref/1"> 

  <xs:element name="prefix" type="xs:string"/>

</xs:schema>

Example 9 – New components in existing or new namespace(s) with version identifier schema v2, incompatible change

A significant downside with using version identifiers is that software that supports both versions of the name must perform special processing on top of XML and namespaces.  For example, many components “bind” XML types into particular programming language types.  Custom software must process the version attribute before using any of the “binding” software.  In Web services, toolkits often take SOAP body content, parse it into types and invoke methods on the types.  There are rarely “hooks” for the custom code to intercept processing between the “SOAP” processing and the “name” processing.  Further, if version attributes are used by any 3rd party extensions – say pref:prefix has a version – then the schema cannot refer to the correct prefix.

Version Strategy: All new components in existing or new namespace(s) for each compatible version(#3)

It is possible to create Schemas with additional optional components.  This requires re-using the namespace name for optional components and special schema design techniques.  The re-using namespace rule is:

8.    Re-use namespace names Rule: If a backwards compatible change can be made to a specification, then the old namespace name SHOULD be used in conjunction with XML’s extensibility model.

An important conclusion is that a new namespace name is not required whenever a specification evolves, only if an incompatible change is made. 

9.    New namespaces to break Rule: A new namespace name is used when backwards compatibility is not permitted, that is software MUST break if it does not understand the new language components. 

Example #2 showed that it is not possible to have a wildcard with ##any (or even ##targetnamespace) following optional elements in the targetnamespace.  The solution to this problem is to introduce an element in the schema that will always appear if the extension appears.  The content model of the extensibility point is then the element + the extension.  There are two styles for this.  The first was published in an earlier version of this article in December 2003.  It uses an Extensibility element with the extensions nested inside.  The second was published in July 2004, then updated on MSDN.  It uses a Sentry or Marker element with extensions following it.

A name type with extension elements is

 

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/1" 
      xmlns:name="http://www.openuri.org/name/1"> 

  <xs:complexType name="name">
    <xs:sequence>
      <xs:element name="first" type="xs:string"/>
      <xs:element name="last" type="xs:string"/>
      <xs:element name="middle" type="xs:string" minOccurs="0"/>
      <s:element name="Extension" type="name:ExtensionType" 
             minOccurs="0" maxOccurs="1"/>
      <xs:any namespace="##other" processContents="lax" 
            minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>
 
  <xs:complexType name="ExtensionType">
    <xs:sequence>
      <xs:any processContents="lax" minOccurs="1" 
             maxOccurs="unbounded" namespace="##targetnamespace"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType> 
 
</xs:schema>

Example 10 – New components in existing or new namespace(s) with Extension Type Schema version 1

Because each extension in the targetnamespace is inside an Extension element, each subsequent target namespace extensions will increase nesting by another layer.  While this layer of nesting per extension is not desirable, it is what can be accomplished today when applying strict XML Schema validation.  It seems to at least this author that potentially having multiple nested elements is worthwhile if multiple compatible revisions can be made to a language.  This technique allows validation of extensions in the targetnamespace and retaining validation of the targetnamespace itself.

The previous schema allows the following sample name:

<name xmlns="http://www.openuri.org/name/1">
  <first>Dave</first>
  <last>Orchard</last>
  <Extension>
    <prefix>Mr.</prefix>
  </Extension>
</name>

Example 11 – New components in existing or new namespace(s) with Extension Type instances

The namespace author can create a schema for this type

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/1" 
      xmlns:name="http://www.openuri.org/name/1"> 

  <xs:complexType name="name">
    <xs:sequence>
      <xs:element name="first" type="xs:string"/>
      <xs:element name="last" type="xs:string"/>
      <xs:element name="middle" type="xs:string" minOccurs="0"/>
      <s:element name="Extension" type="name:PrefixExtensionType" 
             minOccurs="0" maxOccurs="1"/>
      <xs:any namespace="##other" processContents="lax" 
            minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>
 
  <xs:complexType name="PrefixExtensionType">
    <xs:sequence>
       <xs:element name="prefix" type="xs:string"/>
       <xs:element name="Extension" type="name:PrefixExtensionType" 
             minOccurs="0" maxOccurs="1"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType> 
 
  <xs:complexType name="ExtensionType">
    <xs:sequence>
      <xs:any processContents="lax" minOccurs="1" 
             maxOccurs="unbounded" namespace="##targetnamespace"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType> 
 
</xs:schema>

Example 12 – New components in existing or new namespace(s) with Extension Type Schema version 2

The advantage of this design technique is that a forwards and backwards compatible Schema V2 can be written.  The V2 schema can validate documents with or without the prefix, and the V1 schema can validate documents with or without the prefix. 

Further, the re-use of the same namespace has better tooling support.  Many applications use a single schema to create the equivalent programming constructs.  These tools often work best with single namespace support for the “generated” constructs.  The re-use of the namespace name allows at least the namespace author to make changes to the namespace and perform validation of the extensions. 

An obvious downside of this approach is the complexity of the schema design. Another downside is that changes are linear, so 2 potentially parallel extensions must be nested rather than parallel.

Indicating Incompatible changes

Given adoption of the Must Ignore rule, it is often the case that the creator of an extension or a new version wants to require that the consumer understand the extension, overriding the Must Ignore rule.  The previous section showed how a version author could use new namespace names, element names, or version numbers to indicate an incompatible change.  An extension author does not have these mechanisms available for indicating an incompatible or mandatory extension.  A language provider that wants to allow extension authors to indicate incompatible extension must provide a mechanism for indicating that consumers must understand the extension.

10.                       Provide Must Understand Rule: Container languages SHOULD provide a “Must Understand” model for dealing with optionality of extensions that override a default Must Ignore Rule.

This rule and the Must Ignore rule work together to provide a stable and flexible processing model for extensions. 

Must Understand flag

Arguably the simplest and most flexible over-ride technique is a Must Understand flag that indicates whether the item must be understood.  The SOAP [8], WSDL [9], and WS-Policy [10] attributes and values for specifying understand are respectively: soap:mustUnderstand=”1”, wsdl:required=”1”, wsp:Usage=”wsp:Required”.  SOAP is probably the most common case of a container that provides a Must Understand model.  The default value is 0, which is effectively the Must Ignore rule. 

A language designer can re-use an existing Must Understand model by constraining their language to an existing Must Understand model.  A number of Web services specifications have done this by specifying that the components are SOAP header blocks, which explicitly brings in the SOAP Must Understand model.

A language designer can design a Must Understand model into their language.  A Must Understand flag allows the producer to insert extensions into the container and use the Must Understand attribute to over-ride the must Ignore rule.  This allows producers to extend instances without changing the extension element’s parent’s namespace, retaining backwards compatibility.  Obviously the consumer must be extended to handle new extensions, but there is now a loose coupling between the language’s processing model and the extension’s processing model.  A Must Understand flag is provided below:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/1" 
      xmlns:name="http://www.openuri.org/name/1"> 

  <xs:complexType name="name">
    <xs:sequence>
      <xs:element name="first" type="xs:string"/>
      <xs:element name="last" type="xs:string"/>
      <xs:element name="middle" type="xs:string" minOccurs="0"/>
      <s:element name="Extension" type="name:ExtensionType" 
             minOccurs="0" maxOccurs="1"/>
      <xs:any namespace="##other" processContents="lax" 
            minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute ref="name:mustUnderstand"/>
    <xs:anyAttribute/>
  </xs:complexType>
 
  <xs:complexType name="ExtensionType">
    <xs:sequence>
      <xs:any processContents="lax" minOccurs="1" 
             maxOccurs="unbounded" namespace="##targetnamespace"/>
    </xs:sequence>
    <xs:anyAttribute/>
  </xs:complexType>
 
  <xs:attribute name="mustUnderstand" type="xs:boolean"/>
 
</xs:schema>

Example 13 – New components in existing or new namespace(s) with Extension Type Schema and Must Understand

An example of an instance of a 3rd party indicating that a prefix component is an incompatible change:

<name xmlns="http://www.openuri.org/name/1">
  <first>Dave</first>
  <last>Orchard</last>
  <pref2:prefix xmlns:pref2="http://www.example.org/name/pref/1"
        name:mustUnderstand="true">
      Mr.
  </pref2:prefix>
</name>

Example 14 – New components in existing or new namespace(s) instance with Must Understand

Specification of a Must Understand flag must be treated carefully as it can be computationally expensive.  Typically a processor will either: perform a scan for Must Understand components to ensure it can process the entire document, or incrementally process the instance and is prepared to rollback or undo any processing if an not understood Must Understand is found.

There are other refinements related to Must Understand.  One example is providing an element that indicates which extension namespaces must be understood, which avoids the scan of the instance for Must Understand flags.

Type extension

Another option for indicating mandatory requirements is allowing extension authors to use other schema mechanisms for extending the main type, such as type extension.  The language designer allows for type extension, and they must specify that type extensions must be understood. 

<nameWithPrefix xmlns="http://www.openuri.org/name/prefix/1">
  <first>Dave</first>
  <last>Orchard</last>
  <prefix>Mr.</prefix>
</nameWithPrefix>

Example 15 – New components in existing or new namespace(s) with Type Extension instance

The nameWithPrefix schema is an extension of the name with the prefix added.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.openuri.org/name/pref/1" 
      xmlns:pref="http://www.openuri.org/name/pref/1"> 

  <xs:import namespace="http://www.openuri.org/name/1"/>
 
  <xs:complexType name="nameWithPrefix">
    <xs:complexContent>
      <xs:extension base="name:name">
        <xs:sequence>
          <xs:element name="prefix" type="xs:string"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
</xs:schema>

Example 16 – Illegal new components in existing or new namespace(s) with Type Extension schema mandatory extension

Like many of the attempts to write a schema so far, this schema and other variations are problematic.  This schema is illegal because the prefix in the pref namespace and the wildcard with ##other are non-deterministic.  An alternative is to not have the wildcard at all, and rely upon subtyping for extension.  But this prevents any kind of compatible evolution as both sides must have the new schema to understand the type.  There language designer has to choose between allowing compatible extensibility/versioning OR incompatible extensibility when subtyping is used. 

The language designer has the option of using subtyping for incompatible versioning as they could create a nameWithPrefix type that adds the prefix in the same namespace.  This does not enable extension authors to indicate incompatible extensions.

Substitution Groups

Another mechanism for extending a type in XML Schema is substitution groups.  Substitution groups enable an element to be declared as substitutable for another.  This can only be used for incompatible extensions as the consumer must understand the substitution type.  Substitution groups require that elements are available for substitution, so the name designer must have provided a name element in addition to the name type. 

Substitution groups do allow a single extension author to indicate that their changes are mandatory.  The limitations are that the extension author has now taken over the type’s extensibility.  A visual way of imagining this is that the type tree has now been moved from the language designer over to the extensions author.  And the language designer probably does not want their type to be “hijacked”. 

However, this is not substantially different than an extension being marked with a “Must Understand”. In either case – with the extensions higher up in the tree (sometimes called top-typing) or lower in the tree (bottom-typing) – a new type is effectively created. 

The difference is that there can only be 1 element at the top of an element hierarchy.  If multiple mandatory extensions are added, then the only way to compose them together is at the bottom of the type because that is where the extensibility is.

Substitution groups do not allow a language designer and an extension author to incompatibly change the language as they end up conflicting over what to call the name element.  Thus substitution groups are a poor mechanism for allowing an extension author to indicate that their changes are incompatible.  A Must Understand flag is a superior method because it allows multiple extension authors to mix their mandatory extensions with a language designer’s versioning strategy. Hence language designers should prevent substitution groups and provide a Must Understand flag or other model when they wish to allow 3rd parties to make incompatible changes.

In some cases, a language does not provide a Must Understand mechanism.  In the absence of a Must Understand model, the only way to force consumers to reject a message if they don’t understand the extension namespace is to change the namespace name of the root element, but this is rarely desirable.

Extension versus Versioning

The usage of namespace names for identifying components has led to the interesting situation where the distinction between an extension and a version can be quite blurred, depending upon the language designer’s choices. 

One rough way of thinking of these two concepts is that extension is typically the addition of components over space; that is, designers other than the language’s creator are adding components. Versioning is typically the addition of components over time, under the designer’s explicit control.  In either case, a change to the language may be done in a compatible or an incompatible way.  The simple cases of extensions are compatible decentralized additions and versions are compatible or incompatible centralized changes are how we typically distinguish the terms.  But these break down depending upon how the language is designed.

There are a couple of scenarios that illustrate the ambiguity in these terms.  Imagine that version 1.0 of a Name consists of “First” and “Last” elements.  A 3rd party author extends the Name with a “middle” element in a new namespace which they control. 

In scenario 1, the Name author decides to formally incorporate the middle name as an optional (and hence compatible) addition to the name, producing version 1.1 of the Name type.  They do this by referring to the third party’s definition and namespace for middle names.  This is typically considered a new “version” of the Name and would probably result in a new schema definition.  If the Name author re-uses namespace names for compatible revisions, there will be no difference in an instance document containing middle that is of Version 1.0 or Version 1.1 type.  The instance documents are the same, and thus the distinction between a “version” and an “extension” is meaningless for an individual document.

In scenario 2, the middle author decides that the middle name is a mandatory part of the Name type.  They were provided a mechanism for indicating an incompatible change and they use it.  Now an instance of Name with the middle is incompatible with version 1.0 of the Name.  What “version” of the Name is this middle, and is the middle an “extension” or a “version”?  It isn’t 1.0.  It’s probably more accurately thought of as a version defined by the 3rd party.  Again, the presence of the “extension” is actually an incompatible change. 

These two examples – a 3rd party extension being added into a compatible version and a 3rd party extension resulting in an incompatible version – show the ability to specify (in)compatibility has blurred the distinction between these two terms.  

Determinism

This article has spent considerable material describing deterministic content models, and so it is worthy of describing the W3C XML Schema determinism rules in more detail.  The reader is reminded that these rules are unique to W3C XML Schema and other XML Schema languages like RelaxNG do not use these rules and so do not suffer from the contortions one is forced through when using W3C XML Schema.  XML DTDs and W3C XML Schema have a rule that requires schemas to have deterministic content models.  From the XML 1.0 specification,

For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the XML processor cannot know which b in the model is being matched without looking ahead to see which element follows the b.”

The use of ##any means there are some schemas that we might like to express, but that aren’t allowed. 

·        Wildcards with ##any, where minOccurs does not equal maxOccurs, are not allowed before an element declaration.  An instance of the element would be valid for the ##any or the element.  ##other could be used.

·        The element before a wildcard with ##any must have cardinality of maxOccurs equals its minOccurs.  If these were different, say minOccurs=”1” and maxOccurs=”2”, then the optional occurrences could match either the element definition or the ##any.  As a result of this rule, the minOccurs must be greater than zero.

·        Derived types that add element definitions after a wildcard with ##any must be avoided.  A derived type might add an element definition after the wildcard, then an instance of the added element definition could match either the wildcard or the derived element definition.

11.                       Be Deterministic rule: Use of wildcards MUST be deterministic.  Location of wildcards, namespace of wildcard extensions, minOccurs and maxOccurs values are constrained, and type restriction is controlled.

As shown earlier, a common design pattern is to provide an extensibility point – not an element - allowing any namespace at the end of a type.  This is typically done with <xs:any namespace=”##any”>

Determinism makes this unworkable as a complete solution in many cases.  Firstly, the extensibility point can only occur after required elements in the original schema, limiting the scope of extensibility in the original schema.  Secondly, backwards compatible changes require that the added element is optional, which means a minOccurs=”0”.  Determinism prevents us from placing a minOccurs=”0” before an extensibility point of ##any.  Thus, when adding an element at an extensibility point, the author can make the element optional and lose the extensibility point, or the author can make the element required and lose backwards compatibility. 

Why is this hard?

We’ve shown that using XML and W3C XML Schema to achieve loose coupling via compatible changes that fully utilize yet do not require new schema definitions is hard.  W3C XML Schema documents allowing extensibility and versioning are more cumbersome and at the same time less expressive than one might like.  The structural limitations introduced by W3C XML Schema's handling of extensibility are a consequence of W3C XML Schema's design and are not an inherent limitation of schema-based structures.

With respect to W3C XML Schema, it would useful to be able to add elements into arbitrary places, such as before other elements, but the determinism constraint precludes this.  A less restrictive type of deterministic model could be employed, such as the “greedy” algorithm defined in the URI specification [5].  This would allow optional elements before wildcards and removing the need for the Extension type we introduced.  This still does not allow wildcards before elements, as the wildcard would match the elements instead.  Further, this still does not allow wildcards and type extension of the type to co-exist.  A “priority” wildcard model, where an element that could be matched by a wildcard or an element would match with an element if possible would allow wildcards before and after element declarations.  However, this model does not address the typical multi-namespace approach of schema design.  A wildcard that only allowed elements that had not been defined – effectively other namespaces plus anything not defined in the target namespace – may be a more useful model.  These changes would also allow cleaner mixing of inheritance and wildcards.  But that still means that the author has to sprinkle wildcards throughout their types.  A type-level any element combined with the aforementioned wildcard changes is needed.  One potential solution is that the sequence declaration could have an attribute specifying that extensions be allowed in any place, then a commensurate attributes specifying namespaces, elements, and validation rules.  Finally, an extension mechanism that enabled replacement of the wildcard with an updated content model would enable modularity of the compatible and incompatible schemas.

The problem with even this last approach is that with a specific schema it is sometimes necessary to apply the same schema in a strict or relaxed fashion in different parts of a system.  A long-standing rule for the Internet is the Robustness Principle, articulated in the Internet Protocol [4], as “In general, an implementation must be conservative in its sending behavior, and liberal in its receiving behavior”.  In schema validation terms, a producer can apply a schema in a strict way while a consumer can apply a schema in a relaxed way.  In this case, the degree of strictness is not an attribute of the schema, but of how it is used. A solution that appears to solve these problems is defining a form of schema validation that permits an open content model that is used when schemas are versioned.  We call this model validation 'by projection', and it works by ignoring, rather than rejecting, component names that appear in a message that are not explicitly defined by the schema.  This is possible using partial validation in XML Schema.  A two pass schema validation model can do this, where the first pass finds the “extra” content, this is then removed from the components to validate, and a second pass validation is done.

A final comment on XML Schema extensibility is that there is still the unmet need for the ability to define schemas that validate known extensions while retaining extensibility.  An author will want to create a schema based upon an extensible schema but mix in other known schemas in particular wildcards while retaining the wildcard extensibility.  We encounter this difficulty in areas like describing SOAP header blocks.  The topic of composing schemas from many schemas is difficult yet pressing.

Leaving the topic of wildcard extensibility, the use of type extension over the web might be more palatable if the instance document could express a base type if the consumer does not understand the extension type, as in xsi:basetype=””. The consumer could then fallback to using the basetype if it did not understand the base type’s extension.

Another area for architectural improvement is that XML – or even XML Schema - could have provided a Must Understand model.  As things stand, each vocabulary that provides a Must Understand model re-invents the mU wheel.  XML could have provided an xml:mustUnderstand attribute and model that each language could use.  Tim Berners-Lee articulated the need for this in XML in his design note on mandatory extensions in Feb 2000[19], but neither XML 1.0 nor 1.1 included this model.

Finally, there is ambiguity in compliance testing for W3C XML Schema implementations. The W3C XML Schema test collection [15] does not test some of the more common cases that have been precluded here.  For example, the wildcard tests cover a different style, which is xs:any inside a complex type.  These do not cover some of the non-deterministic cases, typically achieved by combining minOccurs/maxOccurs variations with ##any, or combining inheritance with ##any.  Potentially as a result, some implementations do not correctly test for non-determinism, which may yield non-interoperable documents.

One common concern is about implementation support for these features and combinations.  These samples have been tried in many different schema parsers and toolkits, such as XML Beans, SQC, JAX-RPC.  While it’s impossible to know whether all implementations support these rules, there seems to be good support for what was tested.  The author is certainly interested in hearing about toolkits that don’t support these rules. 

Other technologies

The W3C XML Schema Working has heard and taken to heart many of these concerns.  They have plans to remedy some of these issues in XML Schema 1.1 [21].  They currently are looking at a “weak wildcard” model, which solves some but not all of the problems.  There is no public Working Draft of a Schema 1.1 with improved extensibility or versioning at the time of writing this article.

A simple analysis of doing compatible extensibility and versioning using RDF and OWL is available [21].  In general, RDF and OWL offer superior mechanisms for extensibility and versioning.  RDF and OWL explicitly allow extension components to be added to components.  And further, the RDF and OWL model builds in the notion of “Must Ignore Unknowns” as an RDF/OWL processor will absorb the extra components but do nothing with them.  An extension author can require that consumers understand the extension by changing the type using a type extension mechanism.  

RelaxNG is another schema language.  It explicitly allows extension components to be added to other components as it does not have the non-determinism constraint.

Conclusion

This article started many months ago.  At roughly the same time as an a previous version was published, the W3C TAG decided that the topic of versioning and extensibility was important enough to web architecture to work on a finding [22] and include material into the Web Architecture document [23].  While this article provided a starting point for the TAG material, this material will cover a broader scope and progress in a more interactive and iterative fashion than any article can.  Readers can follow an ongoing version of this article and the TAG material for an ongoing treatment of the area of extensibility and versioning.

This article describes a number of questions, decisions and rules for using XML, W3C XML Schema, and XML Namespaces in language construction and extension.  The main goal of the set of rules is to allow language designers to know their options for language design, and ideally make backwards- and forwards-compatible changes to their languages to achieve loose coupling between systems.

References

              

  1. Extending and Versioning XML Languages, by Dave Orchard, ongoing location, http://www.pacificspirit.com/Authoring/Compatibility/ExtendingAndVersioningXMLLanguages.html
  2. Free Online Dictionary of Computing, http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?forward+compatible
  3. Flexible XML Processing Profile, http://www.upnp.org/download/draft-goland-fxpp-01.txt
  4. IETF RFC 791, http://www.ietf.org/rfc/rfc791.txt
  5. IETF RFC 2396, http://www.ietf.org/rfc/rfc2396.txt
  6. IETF RFC 2518, http://www.ietf.org/rfc/rfc2518.txt
  7. IETF RFC 2616, http://www.ietf.org/rfc/rfc2616.txt
  8. SOAP 1.1, http://www.w3.org/TR/SOAP/
  9. WSDL 1.1, http://www.w3.org/TR/wsdl.html
  10. WS-Policy Framework, ftp://ftpna2.bea.com/pub/downloads/WS-Policy.pdf
  11. W3C Note, Web Architecture: Extensible Languages, http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210
  12. W3C XML 1.0, http://www.w3.org/TR/REC-xml
  13. W3C XML Namespaces, http://www.w3.org/TR/REC-xml-names/
  14. W3C XML Schema Part 1, http://www.w3.org/TR/xmlschema-1/
  15. W3C XML Schema Working Group’s Test collection for Any, http://www.w3.org/XML/2001/05/xmlschema-test-collection/result-ms-wildcards.htm
  16. XML.com W3C XML Schema design Patterns by Dare Obasanjo, http://www.xml.com/pub/a/2002/07/03/schema_design.html
  17. XML.Com Versioning XML by Dave Orchard, http://www.xml.com/pub/a/2003/12/03/versioning.html
  18. MSDN Designing Extensible, Versionable XML Formats by Dare Obasanjo, http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnexxml/html/xml07212004.asp
  19. Tim Berners-Lee’s writings on evolution, extensibility and Must Understand:

·         http://www.w3.org/DesignIssues/Mandatory.html

·         http://www.w3.org/DesignIssues/Extensible.html

·        http://www.w3.org/DesignIssues/Evolution.html

  1. http://lists.w3.org/Archives/Public/w3c-dist-auth/1997AprJun/0190.html
  2. Dave Orchard’s writings on extensibility and Versioning:

·        http://www.pacificspirit.com/Authoring/Compatibility

·        http://www.pacificspirit.com/Authoring/Compatibility/OWLRDFExtensibility.html

·        http://www.pacificspirit.com/Authoring/Compatibility/ProvidingCompatibleSchemaEvolution.html

·        http://www.pacificspirit.com/Authoring/Compatibility/Schema11Extensibility.html

  1. W3C TAG Finding on extensibility and versioning, http://www.w3.org/2001/tag/doc/versioning
  2. W3C TAG Web Architecture document section on extensibility and versioning, http://www.w3.org/TR/webarch/ - ext-version

 

Acknowledgements

The author thanks the many reviewers that have contributed to the article, particularly David Bau, William Cox, Ed Dumbill, Chris Ferris, Yaron Goland, Hal Lockhart, Mark Nottingham, Jeffrey Schlimmer, Cliff Schmidt, and Norman Walsh. 

About the Author

David Orchard is a technical director in BEA Systems' CTO Office, focusing on Web services standards. He has been an elected member of the W3C Technical Architecture Group and is an appointed editor of the W3C TAG extensibility and versioning finding; Web services Architecture, XML Protocol, and Advisory committees. He is currently or has been a co-editor of the Web services Architecture, Web services Usage Scenarios, WS-Coordination, WS-ReliableMessaging, WS-Addressing, WS-Eventing, WS-MetadataExchange, WS-Transfer, SOAP-Conversation, XML Link, and XInclude specifications. He has written numerous technical articles and is a frequent speaker on various Internet related technologies.