Published 7 April 1999
| Revision History | ||
|---|---|---|
| Revision 0.1 | 13 March 1999 | |
| Sent for private review by a few WG members. | ||
| Revision 1.0 | 7 April 1999 | |
| Initial publication to the WG. | ||
Substitutability is interesting to our schema efforts because it is another way of saying “code reusability.” If a schema is changed as a result of transformation into a new schema or inheritance of a schema definition in order to create a similar (but not identical) one, it may harm substitutability. We need to examine the substitutability effects of common schema change scenarios in order to figure out where the code reuse “pot of gold” really lies, or if it even exists. In our discussions to date, substitutability has been used as a virtual synonym for additive inheritance (adding properties to a schema). In this paper, I show that this view is too simplistic.
Substitutability is interesting to our schema efforts because it is another way of saying “code reusability.” If a schema is changed as a result of transformation into a new schema or inheritance of a schema definition in order to create a similar (but not identical) one, it may harm substitutability. We need to examine the substitutability effects of common schema change scenarios in order to figure out where the code reuse “pot of gold” really lies, or if it even exists. In our discussions to date, substitutability has been used as a virtual synonym for additive inheritance (adding properties to a schema). In this paper, I show that this view is too simplistic.
First, we need a definition. At the March 1999 face-to-face meeting, Michael Sperberg-McQueen proposed the following definition: “The ability to substitute a value of type T1 wherever a value of type T2 is expected.” In order to examine the effect of schema modification on this quality, I would rather examine the entire set of valid values for T1 and T2. Thus, I propose the following definition (and shortcut):
Given an original schema type definition T1 and a particular application that operates on valid values conforming to it, substitutability is the generalized ability of all valid values conforming to a modified definition T2 to allow the same application to work as designed. For short, the definition T1 is substitutable for T2 in the context of this application.
Another way to cut the definition is to say that a particular application operating on a substitutable schema is “perfectly reusable” without change:
Given an original schema type definition T1 and a particular application that operates on valid values conforming to it, reusability is the generalized ability of the application to work as designed with all valid values conforming to a modified definition T2.
In these definitions, what I mean by “generalized” is that if an application feature reasonably makes use of every fact that the original schema definition can guarantee, it has either a zero chance (if reusable) or a non-zero chance (if not reusable) of failing to work properly if the schema changes. For example, if the original schema has a required element and the schema is changed to remove that element, any application feature that reasonably might depend on the presence of the element will be in trouble.
There are four common classes of application functionality that I will examine closely in this paper.
A number of creation scenarios care about substitutability. If you develop a generator that builds XML documents from various pieces of contributed data from a database, you rely on certain facts about the schema for the document type: that element A has to be placed in thus-and-such a location, that attribute B can have only one of three possible values, etc. Some kinds of schema change/inheritance will break your generator; it will start producing invalid values. Thus it is not perfectly reusable.
As another example, if you maintain a structured authoring and publishing environment based on a schema, certain kinds of schema change/inheritance will require you to upgrade that environment.
Stylesheets are an example of such an application. A stylesheet may reasonably contain rules that are triggered on occurrence of patterns such as “If this is the first subelement, take thus-and-such an action.” Some kinds of schema change/inheritance will require the stylesheet to be changed in order to work properly. More generally, all output conversion programs have this problem as well because context testing, and respect for the original order of content, are inherent in rending for display for human consumption.
Such applications are typically aligned with “XML for data” purposes. For example, a method that does processing on a property value will rely on certain facts provided by the portion of the schema that governs the value. Some kinds of schema change/inheritance will require the method to be changed in order to work properly; thus it is not perfectly reusable.
This is another way of saying “schema-aware validating processors.” Note that reusability means something slightly different here. Think of the two sets of values that conform to T1 and T2, and assume that your environment is set up to validate against T1. If the set of values conforming to T2 includes any values that are invalid according to T1, your environment will need to change to accommodate T2; thus it is not perfectly reusable.
>From a broad perspective, it may seem that there are only two operations that modify a schema's expressive power:
Property addition
Property subtraction (addition of a constraint)
However, because of the power we expect the XML Schema language to have, we also need to take into account the following notions:
Optional vs. required properties
Elements vs. attributes
Content models with significant vs. insignificant order
It turns out that all of these have an effect on reusability.
Because most WG members seem to have been concentrating on data-processing applications and the obvious “safety”of additive inheritance, following is an example of how context-dependent applications—for example, rendering applications—might become unreusable when faced with additive inheritance.
Assume that you have an original schema that imposes the following constraints (expressed in DTD notation):
<!ELEMENT doc (title, front?, body, back)> <!ELEMENT front (author*, copyright)> <!ELEMENT body (div+)> <!ELEMENT div (title, p+)> <!ELEMENT back (div+)> <!ELEMENT title (#PCDATA)> <!ELEMENT p (#PCDATA|emph)*> <!ELEMENT emph (#PCDATA)>
A typical stylesheet instruction might attach vertical pre-space to the instance content that matches the pattern “the first paragraph in a division,” creating the necessary whitespace in the layout design that allows readers to recognize the start of division content.
If you wanted to create a very similar schema with a modified division element that inherits the content model of the original but adds to it, you may have a problem reusing your stylesheet code. If the division model is modified as follows, the pattern in your stylesheet may produce an inappropriate space before the first paragraph whenever a figure occurs before it:
<!ELEMENT div (title, (p|fig)+)>
It's possible to rewrite the pattern—if the stylesheet language allows it—as “the first arbitrary element after the title in a division.” However, stylesheet writers can't account for every possible schema change in their patterns. All it might take to break this more generalized version of the pattern is to allow or require a “division metadata container” as the first element after the title, whose content should be suppressed entirely.
The following table summarizes the reusability of typical processing applications against instances when forced to deal with various kinds of schema modifications.
| Creation and generation | Context-dependent processing | Context-independent processing | Validation | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Add | Subtract | Add | Subtract | Add | Subtract | Add | Subtract | ||
| Required | Required element | Not reusable (app won't know to create it) | Not reusable (app won't know not to create it) | Depends on pattern-match usage | Not reusable (may depend on missing element) | Reusable (app simply ingores additional data) | Not reusable (app may depend on missing property) | Not reusable (reports invalidity for all values in the new set) | |
| Required attribute | Reusable (att values don't form harmful part of content) | ||||||||
| Optional | Optional element | Reusable (app simply won't create it) | Not reusable (app might create it wrongly) | Depends on pattern-match usage | Reusable (couldn't count on presence anyway) | Reusable (app simply ingores additional data) | Reusable (couldn't count on presence anyway) | Not reusable (reports invalidity for some values in the new set) | Reusable (reports validity for all values in the new set) |
| Optional attribute | Reusable (att values don't form harmful part of content) | ||||||||
| Attribute value | Reusable (app simply won't use it) | Not reusable? (throws more or different exceptions than allowed?) | Not reusable? (app might not know how to handle it) | ||||||
The first thing to note here is that reusability/substitutability is not an inherent property of a schema change; it depends on the type of application.
I believe that any complex modification can be interpreted as a series of primitive modifications taken in series. For example, the breakdown of making a formerly required element optional, would entail two parts:
Subtracting a required element
Adding an optional element
If at any point the application cannot be reused as-is, the whole modification is not reusable.
The second thing to note is the strong effect that context dependence has on reusability. Currently, XML DTDs do not allow unordered content models, and the creation and validation columns reflect this state of affairs. If XML schemas end up with this capability, we would need to add columns for context-independent creation and validation and additional commonalities would emerge.
Elaborating on the validation column, following are the effects on the actual valid value sets when a schema is modified. Murata Makoto's schema comparison tool could be used to assess this status.
| Effects of property addition | Effects of property subtraction | ||
|---|---|---|---|
| Required | Required element | Disjoint; schema intersection is the empty set. New instances do not conform to original schema. | |
| Required attribute | |||
| Optional | Optional element | New set is a strict superset; schema diff contains the new property and schema intersection contains the original schema. New instances do not conform to original schema. | New set is a strict subset; schema diff contains the original property and schema intersection contains the new schema. New instances conform to original schema. |
| Optional attribute | |||
| Attribute value | |||
>From this analysis, I conclude that:
We cannot take on “substitutability” as a goal when designing inheritance or reuse characteristics into our schema language; this would presume to know more about schema writers' application space than we have any way of knowing.
We cannot prefer additive inheritance over subtractive inheritance in the language; this would prejudice the kinds of applications that benefit from schema and application code reuse.
Schema designers will have to take their own responsibility for the effects of schema modification; all we can do is educate.