Substitutability Requirements for Different Applications

Eve Maler, Arbortext, Inc.

elm@arbortext.com


Published 7 April 1999

Revision History
Revision 0.1 13 March 1999
Sent for private review by a few WG members.
Revision 1.0 7 April 1999
Initial publication to the WG.

Substitutability is interesting to our schema efforts because it is another way of saying “code reusability.” If a schema is changed as a result of transformation into a new schema or inheritance of a schema definition in order to create a similar (but not identical) one, it may harm substitutability. We need to examine the substitutability effects of common schema change scenarios in order to figure out where the code reuse “pot of gold” really lies, or if it even exists. In our discussions to date, substitutability has been used as a virtual synonym for additive inheritance (adding properties to a schema). In this paper, I show that this view is too simplistic.

Table of Contents
1. Introduction
2. Classes of Application
3. Comparing Reusability of Different Application Classes
Example of a Non-Reusable Stylesheet
Assessing Application Reusability
4. Conclusion

Chapter 1. Introduction

Substitutability is interesting to our schema efforts because it is another way of saying “code reusability.” If a schema is changed as a result of transformation into a new schema or inheritance of a schema definition in order to create a similar (but not identical) one, it may harm substitutability. We need to examine the substitutability effects of common schema change scenarios in order to figure out where the code reuse “pot of gold” really lies, or if it even exists. In our discussions to date, substitutability has been used as a virtual synonym for additive inheritance (adding properties to a schema). In this paper, I show that this view is too simplistic.

First, we need a definition. At the March 1999 face-to-face meeting, Michael Sperberg-McQueen proposed the following definition: “The ability to substitute a value of type T1 wherever a value of type T2 is expected.” In order to examine the effect of schema modification on this quality, I would rather examine the entire set of valid values for T1 and T2. Thus, I propose the following definition (and shortcut):

Given an original schema type definition T1 and a particular application that operates on valid values conforming to it, substitutability is the generalized ability of all valid values conforming to a modified definition T2 to allow the same application to work as designed. For short, the definition T1 is substitutable for T2 in the context of this application.

Another way to cut the definition is to say that a particular application operating on a substitutable schema is “perfectly reusable” without change:

Given an original schema type definition T1 and a particular application that operates on valid values conforming to it, reusability is the generalized ability of the application to work as designed with all valid values conforming to a modified definition T2.

In these definitions, what I mean by “generalized” is that if an application feature reasonably makes use of every fact that the original schema definition can guarantee, it has either a zero chance (if reusable) or a non-zero chance (if not reusable) of failing to work properly if the schema changes. For example, if the original schema has a required element and the schema is changed to remove that element, any application feature that reasonably might depend on the presence of the element will be in trouble.


Chapter 2. Classes of Application

There are four common classes of application functionality that I will examine closely in this paper.

Instance creation/generation applications

A number of creation scenarios care about substitutability. If you develop a generator that builds XML documents from various pieces of contributed data from a database, you rely on certain facts about the schema for the document type: that element A has to be placed in thus-and-such a location, that attribute B can have only one of three possible values, etc. Some kinds of schema change/inheritance will break your generator; it will start producing invalid values. Thus it is not perfectly reusable.

As another example, if you maintain a structured authoring and publishing environment based on a schema, certain kinds of schema change/inheritance will require you to upgrade that environment.

Context-dependent applications that transform instances for human consumption

Stylesheets are an example of such an application. A stylesheet may reasonably contain rules that are triggered on occurrence of patterns such as “If this is the first subelement, take thus-and-such an action.” Some kinds of schema change/inheritance will require the stylesheet to be changed in order to work properly. More generally, all output conversion programs have this problem as well because context testing, and respect for the original order of content, are inherent in rending for display for human consumption.

Context-independent applications that manipulate instances for data processing

Such applications are typically aligned with “XML for data” purposes. For example, a method that does processing on a property value will rely on certain facts provided by the portion of the schema that governs the value. Some kinds of schema change/inheritance will require the method to be changed in order to work properly; thus it is not perfectly reusable.

Validation applications

This is another way of saying “schema-aware validating processors.” Note that reusability means something slightly different here. Think of the two sets of values that conform to T1 and T2, and assume that your environment is set up to validate against T1. If the set of values conforming to T2 includes any values that are invalid according to T1, your environment will need to change to accommodate T2; thus it is not perfectly reusable.


Chapter 3. Comparing Reusability of Different Application Classes

>From a broad perspective, it may seem that there are only two operations that modify a schema's expressive power:

However, because of the power we expect the XML Schema language to have, we also need to take into account the following notions:

It turns out that all of these have an effect on reusability.


Example of a Non-Reusable Stylesheet

Because most WG members seem to have been concentrating on data-processing applications and the obvious “safety”of additive inheritance, following is an example of how context-dependent applications—for example, rendering applications—might become unreusable when faced with additive inheritance.

Assume that you have an original schema that imposes the following constraints (expressed in DTD notation):

<!ELEMENT doc (title, front?, body, back)>
<!ELEMENT front (author*, copyright)>
<!ELEMENT body (div+)>
<!ELEMENT div (title, p+)>
<!ELEMENT back (div+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT p (#PCDATA|emph)*>
<!ELEMENT emph (#PCDATA)>

A typical stylesheet instruction might attach vertical pre-space to the instance content that matches the pattern “the first paragraph in a division,” creating the necessary whitespace in the layout design that allows readers to recognize the start of division content.

If you wanted to create a very similar schema with a modified division element that inherits the content model of the original but adds to it, you may have a problem reusing your stylesheet code. If the division model is modified as follows, the pattern in your stylesheet may produce an inappropriate space before the first paragraph whenever a figure occurs before it:

<!ELEMENT div (title, (p|fig)+)>

It's possible to rewrite the pattern—if the stylesheet language allows it—as “the first arbitrary element after the title in a division.” However, stylesheet writers can't account for every possible schema change in their patterns. All it might take to break this more generalized version of the pattern is to allow or require a “division metadata container” as the first element after the title, whose content should be suppressed entirely.


Assessing Application Reusability

The following table summarizes the reusability of typical processing applications against instances when forced to deal with various kinds of schema modifications.

 Creation and generationContext-dependent processingContext-independent processingValidation
AddSubtractAddSubtractAddSubtractAddSubtract
Required Required element Not reusable (app won't know to create it) Not reusable (app won't know not to create it) Depends on pattern-match usage Not reusable (may depend on missing element) Reusable (app simply ingores additional data) Not reusable (app may depend on missing property) Not reusable (reports invalidity for all values in the new set)
Required attribute Reusable (att values don't form harmful part of content)
Optional Optional element Reusable (app simply won't create it) Not reusable (app might create it wrongly) Depends on pattern-match usage Reusable (couldn't count on presence anyway) Reusable (app simply ingores additional data) Reusable (couldn't count on presence anyway) Not reusable (reports invalidity for some values in the new set) Reusable (reports validity for all values in the new set)
Optional attribute Reusable (att values don't form harmful part of content)
Attribute value Reusable (app simply won't use it) Not reusable? (throws more or different exceptions than allowed?) Not reusable? (app might not know how to handle it)


General Observations on Reusability

The first thing to note here is that reusability/substitutability is not an inherent property of a schema change; it depends on the type of application.

I believe that any complex modification can be interpreted as a series of primitive modifications taken in series. For example, the breakdown of making a formerly required element optional, would entail two parts:

  1. Subtracting a required element

  2. Adding an optional element

If at any point the application cannot be reused as-is, the whole modification is not reusable.

The second thing to note is the strong effect that context dependence has on reusability. Currently, XML DTDs do not allow unordered content models, and the creation and validation columns reflect this state of affairs. If XML schemas end up with this capability, we would need to add columns for context-independent creation and validation and additional commonalities would emerge.


Effects on Valid Value Sets

Elaborating on the validation column, following are the effects on the actual valid value sets when a schema is modified. Murata Makoto's schema comparison tool could be used to assess this status.

 Effects of property additionEffects of property subtraction
Required Required element Disjoint; schema intersection is the empty set. New instances do not conform to original schema.
Required attribute
Optional Optional element New set is a strict superset; schema diff contains the new property and schema intersection contains the original schema. New instances do not conform to original schema. New set is a strict subset; schema diff contains the original property and schema intersection contains the new schema. New instances conform to original schema.
Optional attribute
Attribute value


Chapter 4. Conclusion

>From this analysis, I conclude that: