Substitution rules must be in V1.0

| | Comments (5)

I had an idea about compatibility that seems really interesting to me. I got thinking about why is it that we can talk about format extensibility and versioning and provide some guidance on how to achieve distributed compatible evolution. Lots of people have talked to me about different models for dealing with extensions. Eve's paper on type substitution really struck me - and I'll be posting her paper shortly..

My realization is thus: distributed compatible evolution can only be achieved if substitution rules for extensions are in V 1.0. If you don't provide substitution rules, you can't get compatible evolution.

In order to achieve compatible versioning, a language (format or protocol) must allow for extensibility in instances of the language. That is, it allows instances that it doesn't fully understand. This is typically embodied in a rule such as "allow extensibility". I think of this as two sets of instances: the known set and the allowed set.

Then there has to be some kind of rule about what to do with these extensions. One darned good rule is to ignore the extensions. The expression with extensions is transformed into an expression where everything is known. The must ignore unknown extensions rule effects a substitution rule that says how one instance is substituted with another.

There are other very useful rules that typically have more elaborate substitution rules. XSLT provides a substitution rule where if an element isn't understood, then a fallback rule is applied and the fallback contains something that is understood.

In programming languages, the substitution is typically done using polymorphism where the rules for how one type is transformed into another. But this model doesn't work in the case of where software that understands the old type gets an instance of the new type, as it often doesn't know how to transform the instance of the new type into an instance of the old type. This model, where the new type and substitution rules are not available to the client, is what I call "touchless" extensibility.

Now some folks have pushed back and said "well, why not send the new type with the instance?" and the problem is twofold: How is the new type identified as substitutable for an older type, and how is that substitution done. If type B extends type A, how is an instance of B identified as an instance of A and how is B transformed into A? Perhaps the extensions should be ignored. But this type identification and transformation isn't in XML Schema today.

We could add an "xsi:baseType=a" to identify the type and also provide a transformation with each instance of B. Maybe an XSLT stylesheet. This is getting darned close to what XSLT did for itself, FWIW. At any rate, there is no momentum to add these kinds of functionality in XML Schema.

I was really intrigued by the Semantic Web and RDF/OWL. It turns out that RDF/OWL do have both the extensibility and substitutability rules. There is some funky syntax that can be sent with an instance that specifies a transformation to a different version. It seems that RDF/OWL have provided properly for distributed extensibility in V 1.0, and good on them.

I'll observe that this is arguably why XML 1.1 is not backwards compatible with XML 1.0. XML 1.0 did not allow for the extra characters that XML 1.1 allows, nor did XML 1.0 provide substitution rules for transforming the extra XML 1.1 characters into something understood by an XML 1.0 processor. Both necessary aspects were missing in XML 1.0, hence compatible evolution is not possible.

The summary of this posting is that the substitution rules have to specified in the very first version of a language in order to achieve distributed extensibility. The substitution rules could be static - like "must ignore unknown extensions" - or dynamic - like an xslt or owl transform. If the substitutabilty rule(s) aren't in the very first version of a language, there's no way an instance with extensions can be transformed into an instance that the software can work on.

If you are providing a language that you believe requires distributed extensibility, you really do need to plan for extensibility and substitutability.

5 Comments

WRT RDF rules - they're funky, but very powerful!

:)

Does this mean that you think XML 1.1 really should have been called XML 2.0 (because most people see .x releases as backwards-compatible)?

"My realization is thus: distributed compatible evolution can only be achieved if substitution rules for extensions are in V 1.0. If you don't provide substitution rules, you can't get compatible evolution."

In theory, yes, exactly. In practice, partly right, since there are often other extension points that can be leveraged. For example, consider how SOAP or RFC 2774 add mandatory extension capabilities to HTTP in two different ways without requiring a brand new protocol or a version rev. But that also comes with costs (e.g. have to use SOAP headers instead of HTTP headers); It's certainly best to get it done in v1.0 if you can.

P.S. search the w3c-html-wg@w3.org archives for "mandatory extension xhtml" to see my (failed) attempt to get mandatory extensions into XHTML Modularization 1.0 back in '99/00.

BTW, good to see the acknowledgement of RDF. Now maybe you can appreciate why I've been promoting it as strongly as I have. "RDF/XML; putting the 'X' in 'XML'" 8-)

MarkN: You are probably right that XML 1.1 should have been called 2.0 because of being an incompatible change.

MarkB: interesting comparison between extensibility and soap/http. One way of looking at soap/http is that soap was created to get an extensibility mechanism through firewalls via http's POST body extensibility mechanism. If http had not provided POST, it's unclear whether soap would have taken off. I'd argue that soap does provide a missing substitution mechanism, namely "must ignore unknown" or "can't ignore". Further, if HTTP had a better header extensibility mechanism for XML, and XML had support for mustIgnore, I'm really not sure how much SOAP would have taken off.

As for RDF/OWL, you see the difference in our approaches. I look at a variety of solutions in the context of a particular problem - distributed evolution - and compare them. Then pros and cons become clearer via the comparison mechanism. It's just engineering trade-offs of various designs against problem spaces, though often affected by societal forces.

Hey,

I don't totally follow your HTTP/SOAP comment, but it sounds about right.

Re RDF, I think the difference in our approaches is that I studied a variety of solutions to the distributed evolution problem a few years ago, and concluded that RDF provided a good enough solution to it, whereas you seem to be going through that same investigation now. Better late than never! (just kidding! 8-) 8-) I'm confident you'll come to the same conclusion I did. Cheers.

Re. RDF/OWL - "...funky syntax that can be sent with an instance that specifies a transformation to a different version..."

What funky syntax in particular have you in mind?

Leave a comment

About this Entry

This page contains a single entry by Dave Orchard published on May 26, 2004 4:17 PM.

N'awlins was the previous entry in this blog.

NYC Visit is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories