The rationale for extensibility and versioning are explained in the TAG finding that Norm and I have been working on for some time. This note shows how to use RDF and OWL to describe structures, and compares this with XML Schema. In summary, I believe that there are some positive aspects of instance and schema evolution in a distributed environment that RDF/OWL can accurately describe and constrain.
The main goal is to promote what I call distributed touchless evolution for application development. Distributed touchless extensibility and versioning is fairly well defined in the finding, but a rough summary is that it promotes loose coupling of systems by allowing one side or the other of a message exchange to evolve or change without requiring the opposite side to change, or be "touched", in any way. A sender could add optional content to a message and type and the receiver can process newer messages without changing. Kind of the way much of the Web works. We'd also like to be able to write a V2 Schema using the new optional content, and have v1 and v2 instances validate against it. XML Schema supports some parts of distributed touchless extensibility and I'll use it as a point of comparison.
The comparison is best illustrated with an example. The V1 schema for <name> consists of a <first>, a <last>, and allows optional undefined extensions. The V2 schema adds a <middle>, retains the optional undefined extensibility, and does not allow an <areacode> between the <middle> and the <last>. The purpose of the V2 is that middle and area codes have been defined, and also where they can occur has also been defined.
Under V1 Schema, the following xml instances are legal:
The PSVI will have the type information of the first and last name.
And the following instances are illegal
In XML Schema, this looks something like:
We'd prefer the ##other to be ##any to allow extension in the same namespace, but that isn't legal W3C Schema. Maybe in Schema 1.1, but not 1.0.
Under V2 schema, the following instances are legal:
It would be nice, but not required, that elements that are known in the schema be illegal in the extensibility point, ie the following would be illegal. This is extremely difficult in XML schema to forbid certain elements in an extensibility point at the end of a structure.
And the following instances are illegal
The following XML Schema shows roughly what is desired:
After XML Schema validation, the PSVI will have the type information of the first, last and middle name.
Now I'm trying to get the same kind of expressibility in RDF and OWL. I create OWL and RDF constructs to model the construct and show the instance.
As I understand RDF/OWl - inasmuch as a fish understands breathing air without water - there's a number of things I need to do to create something like the equivalent. Firstly, I need to use unordered properties rather than sequences. Let's see what I have to do in total, using N3 format to show
1. I need to create my namespaces. We need owl, rdf, rdf scheam, xsd, myns, and a log prefix. Watch out for whitespace in the prefixes. A sample is:
2. I create a Name type that is an OWL Class. I call it "my Name" and I give a documentation reference to it.
3. I also have to say that Name has 2 properties, first and last, that have cardinality of 1. This is done by saying that there is a restriction on cardinality on the properties. I have no idea why the type (nonNegativeInteger) of the cardinality type has to be sent in an instance. But that's the least of the things that confuse me :-) I also have two choices for expressing cardinality constraints, class local and global. Global cardinality constraints only work for maxCardinality=0 and are expressed by declaring the property in question to be of type owl:FunctionalProperty. All other cardinality constraints are on a per class basis. I choose class constraints.
4. I have to define the types for first name and last name. I'll say that they are data type properties on Name, with Labels First and Last, and types string. This seems analogous to constructing XML schema where the First and Last names where their own types, instead of names for the built-in schema string type. Seems like a useful shorthand in Schema compared to RDF/OWL. I could have chosen ObjectProperties, which is good for adding OWL axioms, but I want to add more xml schema structure hence DatatypeProperty. First name looks like:
Combining these together, I get the V1 ontology, available in n3 at http://www.pacificspirit.com/Authoring/Compatibility/v1ont.n3 also available in OWL/XML at http://www.pacificspirit.com/Authoring/Compatibility/v1ont.owl) is
This was validated using using Pellet.
Looking at the validation results, we see a couple things.
1. Pellet didn't know the type triple of the middle name. It really likes to have these, so it guesses that the middle name is a datatype property. That's why the Warning: Assuming http://www.example.com/ns/1.0#middle is a datatype property. comes up.
2. The classification tree shows both names are of class name.
I then add the middle name and area codes to the #name. This time I'll use RDF/XML syntax, available at http://www.pacificspirit.com/Authoring/Compatibility/v2ont.owl, because it's soo much fun
In order to prevent an area code, I need to add the area code with a cardinality of 0. Now I think this is a pretty big problem. The whole Semantic Web world view of open content models comes and bites us here. The rough assumption is that if a property isn't specifically exluded, it might be related to the thing. But in large enterprise systems, we know that is not the case. When we talk about purchase orders and manufacturing systems, we don't assume that a PO might have a "customer". If the PO doesn't have a customer, then it's not by accident or omission. As Bijan puts it, it's better to think of RDF/OWL classes as *following* data rather than prescribing it.
The net of this is that it is very cumbersome to design a large schema, and then go through and exclude each of the defined items from all the extensibility points.
In the same way that Schema allows us to change the type definition for a given namespace name, RDF/OWL allows us to update v1ont if we choose, rather than create a v2ont.
It is my understanding that the owl properties of priorVersion and backwardCompatibleWith are not appropriate for individual classes as they are intended for languages as a whole. There are some class and property annotation properties, but they have no defined semantics.
In comparing RDF/OWL to W3C Schema from an extensibility/versioning perspective, there are signifcant differences between them. This doesn't attempt to look at huge other areas of interest that each architecture covers.
XML Schema has well documented issues of how to allow default extensibility, to perform substitution of one thing for another (ie V2 instance of name in software that only understand V1). RDF/OWL make it easier to do extensibility as an OWL class doesn't require explicit wildcards or extensibility points. It has a more open content model, so the items can be added in the middle of the thing. RDF/OWL seem to have the "must ignore unknowns" built in, as adding an extra item (like middle) into an instance still validates using the V1 class.
Neither OWL/RDF or Schema make it easy to specify which things or properties can go where in a content model, particularly excluding certain things. Neither allows an extension author to say that an extension is required or provide an extension author controlled substitution model how to convert a V2 type into a V1 type if V2 isn't supported. As confusing as I typically find XML Schema syntax, I found the RDF/OWL syntax even more so. But I tend to like strongly typed items (ie <owl:Thing doesn't do much for me) so that's no surprise.
If we think of the main point of a schema language as defining the language for exchanging information, it seems that RDF/OWL is a easier to use for extensibility and versioning. Which might be no surprise given the design centres. But given the inability to control the schemas in all the right facets - such as mandatory extensions - it doesn't fully solve the problems of large scale distributed system extensibility and versioning. More work to be done....